Camelia, the Perl 6 bug

IRC log for #gluster, 2012-10-24

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 noob2 joined #gluster
00:03 noob2 i know a few of you guys have played around with ovirt before.  Have any of you run into a problem where gluster complains of stale nfs handles after attaching it as a storage domain?
00:05 noob2 it looks as if all the ovirt nodes are trying to write to a metadata file and causing gluster to smack some of them down
00:06 Technicool noob2, which metadata file?  for ovirt or for an app?
00:06 noob2 for ovirt
00:06 noob2 this file in particular: ade24a75-7b0b-49b8-999b-a0​bf7f4132cd/dom_md/metadata
00:06 noob2 on the storage mount
00:07 Technicool hmm
00:07 Technicool replica 2 volume?
00:07 noob2 yeah it's a replicate 2
00:07 noob2 i had some problems in the past with split brain so i enabled quorum
00:07 noob2 that might be my problem
00:08 noob2 something with the way i'm rsyncing data onto the gluster occasionally causes it to split so after i turned on quorum that stopped almost all of it
00:09 Technicool thought quorum required 3 nodes?
00:09 noob2 yeah i thought the same
00:09 Technicool either way it's not immediately clear to me why the issue would occur
00:09 noob2 maybe enabling it does nothing on a 4 node gluster
00:10 Technicool this is 3.3.0 or better?
00:10 noob2 Technicool: i see a bunch of these in each ovirt node log file: [2012-10-23 20:09:54.066753] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gluster-client-15: remote operation failed: Stale NFS file handle. Path: /ade24a75-7b0b-49b8-999b-a0​bf7f4132cd/dom_md/metadata (941164fa-ad59-49cb-bc07-174cbe249311)
00:10 noob2 yeah this is 3.3.0
00:11 noob2 glusterfs-server-3.3.0-6.el6.x86_64
00:11 Technicool reminds me of an issue seen with KVM previously
00:11 noob2 same kinda deal?
00:12 noob2 the interesting part is this
00:12 Technicool sort of, it was awhile back, trying to remember the details
00:12 noob2 if i add the gluster storage domain and add it to one ovirt node it's happy.  as soon as i add the second it flips out
00:13 Technicool that would make sense maybe, if the meta-data file is locked once you add the domain
00:13 noob2 yeah
00:13 noob2 it seems to flip flop back and forth
00:13 noob2 saying storage domain is on ovirt02, storage is on ovirt03
00:13 noob2 and so on
00:13 Technicool if you detach from the current ovirt node, the second one works fine from then on?
00:14 noob2 let me put 1 node into maint mode and see what it does
00:14 noob2 yeah haha, the gluster storage goes up to green after that
00:14 Technicool yeah
00:15 noob2 the nodes must be fighting over that file then
00:15 noob2 and gluster is preventing them from both writing
00:15 noob2 which is a sane thing to do
00:15 Technicool just a guess but the second node thinks there is an issue since the file is locked and it can't preempt
00:15 Technicool i don't know ovirt well enough to know the mechanics of that file and its purpose tho
00:16 Technicool we can continue this on #ovirt and see if any smart people there can assist
00:16 noob2 you're on there as well?
00:16 Technicool yes
00:16 noob2 :)
00:29 Ryan_Lane left #gluster
00:54 zhashuyu joined #gluster
01:03 bullardo joined #gluster
01:07 sashko joined #gluster
01:12 kevein joined #gluster
01:33 JoeJulian s/( without.*mounted)/,\1,
01:34 JoeJulian s/( without.*mounted)/,\1,/
01:35 JoeJulian Technicool. Sorry, lack of punctuation caused you to misunderstand. "Create the volume without the block storage being mounted in the same order it was before." should have been "Create the volume, without the block storage being mounted, in the same order it was before."
01:41 nightwalk joined #gluster
02:35 nightwalk joined #gluster
02:47 sunus joined #gluster
02:55 ika2810 joined #gluster
03:00 ika2810 joined #gluster
03:21 mohankumar joined #gluster
03:39 ika2810 joined #gluster
03:46 Eco_ joined #gluster
04:12 saz joined #gluster
04:32 deepakcs joined #gluster
04:37 sripathi joined #gluster
04:37 mdarade1 joined #gluster
05:00 overclk joined #gluster
05:17 lng joined #gluster
05:18 lng Hi! What does it mean? �{]f���N�root@ip-10-152-15-97:/opt/webap​ps/animallife_live_v1-0-0# cat /storage/1430000/1438000/1438200/1438227/game.dat
05:18 lng the file is there
05:19 lng -rw-r--r-- 1 www-data www-data 6978 Oct 23 06:42 /storage/1430000/1438000/1438200/1438227/game.dat
05:19 lng but I cannot read it
05:20 lng I have mounted my test storage...
05:20 lng cat: /storage/1430000/1438000/1438200/1438227/game.dat: Input/output error
05:22 Humble joined #gluster
05:45 bulde1 joined #gluster
05:48 bulde1 joined #gluster
06:18 bulde1 joined #gluster
06:46 ctria joined #gluster
06:52 guigui3 joined #gluster
06:58 Azrael808 joined #gluster
07:03 zhashuyu joined #gluster
07:07 vincent_vdk joined #gluster
07:08 dobber joined #gluster
07:10 Humble joined #gluster
07:24 Nr18 joined #gluster
07:26 guigui3 joined #gluster
07:34 TheHaven joined #gluster
07:34 ondergetekende joined #gluster
07:47 andreask joined #gluster
07:54 36DACB5Q1 joined #gluster
07:54 lkoranda joined #gluster
07:57 Triade joined #gluster
08:06 Tarok joined #gluster
08:07 Tarok_ joined #gluster
08:07 sunus joined #gluster
08:10 ika2810 left #gluster
08:19 royh joined #gluster
08:19 royh if the node that i've mounted falls down what happens?
08:20 royh obviously the cluster still works, but what happens to my application? it can't reach the gluster cluster any more? any way to have the other servers automatically take over the ip of the server that is down?
08:32 hyt joined #gluster
08:34 berend joined #gluster
08:40 maxiepax in the documentation at http://gluster.org/community/doc​umentation/index.php/QuickStart step 4 says to wget a yum repo file, this repo file however points to a folder that does not exist. the error is it should be /glusterfs/ in the path but it says /glusterfs/glusterfs/ (twice)
08:41 maxiepax anyone here that works with the webpage that can confirm, and fix this?
08:44 maxiepax yum install glusterfs{,fuse,server} this command (same guide) does not work since packages have been renamed? should be yum install glusterfs{,-fuse,-server}
08:45 TheHaven joined #gluster
09:09 ndevos royh: if you mount over glusterfs, the glusterfs (fuse) client will do the fail-over automatically
09:10 ndevos royh: but, if you mount over nfs or cifs, you will need a virtual-ip and use ctdb or ucarp or something like that
09:11 royh ndevos: so it's probably a good thing to have the individual servers use a fqdn and available from where you're gonna use your gluster cluster?
09:11 ndevos maxiepax: thats a known issue and people are trying to get it corrected
09:11 royh ndevos: yeah, i was thinking about setting up ucarp or using a HA feature in BigIP
09:12 ndevos royh: any glusterfs client should be able to talk to all the storage servers directly, resolving the fqdn should work for that too
09:14 royh ndevos: so when you interface through fuse you don't speak with one server directly for all the traffic, you speak with all the servers you are getting data from directly?
09:15 royh in other words, a VIP is a bad idea for the fuse client?
09:22 maxiepax ndevos: thanks :)
09:22 ndevos royh: the vip will only be used by the fuse client for mounting, after that the fuse client knows the structure of the volume and contacts the bricks directly
09:23 royh ndevos: ah. that makes sense :)
09:23 maxiepax is getting error 107 when trying to probe a peer also a known issue? cant find anyone while googling that has a sollution.
09:23 royh ndevos: thanks a bunch!
09:24 ndevos royh: you're welcome :)
09:25 ndevos maxiepax: 107 means "ENOTCONN: Transport endpoint is not connected", maybe glusterd is not running on the other server or has a firewall?
09:27 maxiepax ndevos: iptables -L shows no blocking rules, and service gluserd status shows that the service is running on all 3 nodes. (same error on all 3). checked dns and ptr, both work, also tried using ip's, still didnt work.
09:31 ndevos maxiepax: very awkward... did you check the logs already? /var/log/glusterfs/etc-glusterd... (or similar) might give some clues
09:32 ndevos maxiepax: oh, maybe... the 'gluster' command tries to talk to localhost:24007 (on IPv4), does localhost resolve ok?
09:38 ondergetekende joined #gluster
09:39 glusterbot New news from newglusterbugs: [Bug 869559] Pacemaker resource agents for glusterd and volumes <https://bugzilla.redhat.com/show_bug.cgi?id=869559>
09:53 gbrand_ joined #gluster
10:00 sripathi joined #gluster
10:00 duerF joined #gluster
10:01 mohankumar joined #gluster
10:23 36DACB5Q1 left #gluster
10:29 stickyboy joined #gluster
10:47 maxiepax ndevos: sorry had to go to lunch, it was a localhost problem :)
10:48 ndevos maxiepax: ah, thanks for letting me know
10:48 * ndevos will have lunch now
11:11 pkoro joined #gluster
11:22 sshaaf joined #gluster
11:25 deckid joined #gluster
11:31 ninkotech_ joined #gluster
11:35 tru_tru joined #gluster
11:40 Alpinist joined #gluster
11:44 Psi-Jack semiosis: Handy? What's mounting-glusterfs's status after a successful automount at bootup look like? start/started, stop/waiting, or what? ;)
12:03 dastar_ joined #gluster
12:09 balunasj joined #gluster
12:10 Fabiom joined #gluster
12:26 guyvs joined #gluster
12:44 guyvs joined #gluster
12:49 plarsen joined #gluster
12:51 plarsen joined #gluster
12:55 aliguori joined #gluster
12:57 gbrand_ joined #gluster
13:28 ndevos maxiepax: johnmark just fixed the .repo files
13:34 Nr18 joined #gluster
13:42 Psi-Jack semiosis: Looks like I may've gotten it. A minimalized version of wait-for-state that's blocking mounting of glusterfs, but doesn't hold up the bootup more than 10 seconds (in case of failure)
13:48 semiosis :O
13:49 Psi-Jack Heh. The bad thing is, it wait-for-state was looking for start/started, but what glusterd was actually in is start/running. ;)
13:50 Psi-Jack So, that's probably actually a big in wait-for-state.conf from 12.04. ;)
13:50 Psi-Jack bug*
13:50 semiosis Psi-Jack: well, fwiw, unless you had some issue with the existing glusterd start expression, i'd suggest keeping it rather than reinventing it
13:54 Psi-Jack I did have a problem with it. It's unreliable to depend on it. ;)
13:54 semiosis well you could certainly file a  bug with ubuntu about that
13:54 semiosis glusterbot: awesome
13:54 glusterbot semiosis: ohhh yeeaah
13:54 saz joined #gluster
13:54 Psi-Jack Hehe, yeah, wait-for-state, I plan to, cause scanning for started alone will almost never happen. ;)
13:54 Psi-Jack Especially every 30 seconds, by it's default.
13:58 jbrooks joined #gluster
13:59 lowtax2 change it from start/running to start/started
13:59 Psi-Jack lowtax2: Heh, no, that was the problem, it was start/started, but needed to be start/running. :)
14:00 lowtax2 ok
14:00 jbrooks joined #gluster
14:02 stopbit joined #gluster
14:04 * semiosis contemplates the semantics of started vs. running w/r/t a mount
14:10 Psi-Jack semiosis: This also fixes the issues on 12.04 and such, because if for whatever reason glusterd fails to start, this will not block system bootup. So my fixes actually fixes both 10.04 and 12.04. ;)
14:10 semiosis Psi-Jack: neat, though usually people would use nobootwait option on their glusterfs mounts in fstab for that purpose
14:10 Psi-Jack As mountall blocks boot up until everything is completed. ;)
14:16 wushudoin joined #gluster
14:23 mohankumar joined #gluster
14:24 jdarcy Looks like Cloudera updated the HDFS API to support Impala, and we'll have to keep up.  :(
14:26 kashyap joined #gluster
14:27 gbrand__ joined #gluster
14:32 Teknix joined #gluster
14:33 Psi-Jack Interesting.
14:35 Psi-Jack So I'm running bonnie++ on my gluster mount from a remote client connected to 2 glusterfs servers. Load averages on mfgfs01 is peaking to 2.7, while load average on mfgfs02 is barely even getting beyond 0.5
14:37 Teknix joined #gluster
14:37 forest joined #gluster
14:38 dbruhn_ joined #gluster
14:39 dbruhn_ Should I be watching out for anything while upgrading to 3.3.1.
14:43 ndevos Psi-Jack: if you have a distributed volume, maybe bonnie++ writes just to one brick?
14:46 Psi-Jack It's replicated, actually.
14:46 Psi-Jack replica 2
14:47 rwheeler joined #gluster
14:47 raghu joined #gluster
14:47 semiosis Psi-Jack: are files being written to both bricks?
14:48 Psi-Jack Yep
14:49 Psi-Jack Err, wait..
14:49 Psi-Jack No.... No it bloudy well isn't..
14:49 Psi-Jack Oh nevermind, yes, it was. it was just done with bonnie++ already. ;}
14:50 Psi-Jack Cause my mount is /data, I made the directory /data/perf, and it does in fact appear on both servers.
14:50 Psi-Jack touch a file in /data, appears in both.
14:51 gbrand__ joined #gluster
14:54 Psi-Jack Eh well, I'll go ahead and slap monitoring on these servers. Get Zabbix to show me what's going on. ;)
15:06 lowtax joined #gluster
15:08 daMaestro joined #gluster
15:30 Triade joined #gluster
15:38 lkoranda joined #gluster
15:49 kkeithley joined #gluster
15:51 z00dax joined #gluster
15:51 z00dax hello
15:51 glusterbot z00dax: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
15:55 samkottler z00dax: I think you just got bot-served :P
15:57 z00dax samkottler: :( looks like it
16:03 dastar joined #gluster
16:04 blendedbychris joined #gluster
16:04 blendedbychris joined #gluster
16:07 jbrooks joined #gluster
16:08 jbrooks joined #gluster
16:08 steven_ joined #gluster
16:08 bfoster joined #gluster
16:08 t35t0r joined #gluster
16:08 t35t0r joined #gluster
16:08 cyberbootje joined #gluster
16:08 helloadam joined #gluster
16:09 stopbit joined #gluster
16:10 crashmag joined #gluster
16:11 semiosis hi z00dax
16:11 * semiosis = @pragmaticism
16:11 semiosis welcome to #gluster
16:14 z00dax hey, howse it going
16:15 sashko joined #gluster
16:18 Tarok_ joined #gluster
16:18 semiosis pretty good, thx.  you?
16:19 tmirks joined #gluster
16:25 z00dax not bad.
16:25 Psi-Jack Hmmm. Somehow I know z00dax. Just can't remember from where. ;)
16:25 z00dax so, glusterfs-3.3.0-3.el5 and the same for .el6 are the rpms i'm using
16:25 overclk joined #gluster
16:26 z00dax and the problem i have is that even with a single node, something goes super slow over the fuse interface
16:26 z00dax if i setup a single machine, run a createrepo job on there, it completes in ~ 12 minutes
16:27 z00dax mount a glusterfs export from the same machine, and that times goes to like 28 to 35 minutes
16:27 z00dax this isnt a super fast machine, its an opteron 2218 with 16gb of ram and a 3ware raid running 4 500gb seagate sata7200's in a raid10
16:28 z00dax with xfs as the base filesytem on both the locally accessed lv, and the lv exported via gluster
16:28 Mo_ joined #gluster
16:29 z00dax fanning out a bit, moving to a 4 machines doing 2 bricks each ( these are all opteron 2218's with the same'ish configs ) and mounting on a single node, doing the same workload and timing is down to 8 minutes ( after waiting for the rebalance etc to finish and network i/o to die out completely )
16:30 z00dax mount the same glusterfs export onto another client ( now we have 2 clients ) and times for the job goto 25+ minutes
16:31 aliguori joined #gluster
16:31 semiosis ~pasteinfo | z00dax
16:31 glusterbot z00dax: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
16:31 pdurbin zomg, z00dax is in here!
16:31 semiosis z00dax: what kind of network connects your servers & clients/
16:31 semiosis ?
16:31 z00dax the 'storage network' is a dedicated gb switch, with jf on, and intel e1000's all around
16:32 z00dax the clients and storage nodes are all directly connected to that switch, its a el-cheapo hp
16:34 z00dax let me setup something real quick
16:36 z00dax pdurbin: hey
16:36 Psi-Jack Heh, wow..
16:37 pdurbin z00dax: thanks for all the work you do on centos
16:37 Psi-Jack iozone's pushing the CPU of both these servers hard core. 6 core VMs with 6GB RAM, perf.cache set to 4GB, load of 14.5 peaking. ;)
16:37 z00dax semiosis: https://gist.github.com/ee1156eb9b0b73b1573e
16:37 glusterbot Title: gist: ee1156eb9b0b73b1573e Gist (at gist.github.com)
16:37 Psi-Jack Nearly 10 sustained, so far. ;)
16:38 z00dax semiosis: thats kinda like my play instance - also has similar issues, uses the same code ver as 'production'
16:39 z00dax brb
16:40 semiosis ok, no replication, interesting
16:41 semiosis i'm not too familiar with createrepo so idk what kind of workload it would create
16:41 johnmark z00dax: howdy
16:41 johnmark pdurbin: +1
16:42 semiosis maybe JoeJulian or kkeithley could weigh in on that
16:42 jdarcy IIRC, createrepo is going to be pretty heavy on metadata.
16:43 semiosis or jdarcy :)
16:44 tryggvil joined #gluster
16:44 semiosis jdarcy: note this is a distribute only volume
16:44 jdarcy semiosis: Yep.
16:45 kkeithley as yum repos go, the glusterfs repo is pretty small. It doesn't take more than a few seconds to create one, so my experience probably isn't very interesting.
16:45 jdarcy So you're saying that you got 8 minutes on one client, then 25+ minutes for a second client on the same volume?
16:45 kkeithley wasn't it 8 min. for local disk
16:45 kkeithley nope, nm
16:46 jdarcy I'm reading 12 minutes for local, 28-35 for GlusterFS with a single server/volume, 8 minutes with 8 bricks on 4 servers, then 25+ for a second client on that last volume.
16:47 jdarcy I'm trying to think why the second client would be slower.
16:48 jdarcy z00dax: So you had already done a pass over this data (28-35 minutes), then expanded the volume and did a second pass?
16:48 semiosis jdarcy: could it be because one of those clients is on a glusterfs server with 1/4 of the volume local, but other client is on a non-server?
16:49 jdarcy semiosis: Maybe.
16:50 z00dax sorry guys, i just realised that one of the clients is on el5 and the other on el6, just changing that so both are on el6, and will powercycle the whole lot to set a common base again
16:50 jdarcy Hm.  It would be interesting if el5 and el6 were that different.
16:53 z00dax jdarcy: so, i can run/ rerun the job on one client, either of these machines and it stays fairly quick.
16:54 z00dax things get super slow when there are more clients with the same mounts
16:54 z00dax even when they are doing very light i/o
16:54 jdarcy The already-mounted clients get slower when others are added?
16:55 johnmark does anyone know why glusterfs uses ports below 1024?
16:55 johnmark that seems unnecessary, and in some cases, conflicts with other things
16:55 semiosis johnmark: +1
16:55 semiosis it's come up here several times, and i dont recall any good answers
16:55 jdarcy johnmark: Supposedly security.  Ports below 1024 are supposed to be privileged, but that's like 80s thinking.
16:57 z00dax jdarcy: yup
16:57 jdarcy What does it matter if a port is only usable by root when anybody can be root on their own machines and you didn't authenticate the host?  Pretty pointless, if you ask me (nobody did).
16:58 jdarcy z00dax: Well, that's a new one.  Processing...
16:58 z00dax there is also the thing about enterprise policy where people will want 'system services' to run on priv ports...
16:58 z00dax but then these are the same sort of people who dont trust dhcp and wont allow tftp on their networks
16:59 jdarcy ...but will allow their executives to do email from their insecure iphones.
16:59 johnmark haha :)
16:59 z00dax sounds about right
17:00 johnmark jdarcy: can we tell them anything other than "stfu"?
17:00 jdarcy johnmark: Unfortunately, I don't think so.  Just because the reasons are bad doesn't mean they're changeable.
17:00 johnmark lulz
17:00 johnmark that's... aweso0me
17:00 johnmark er awesome
17:01 jdarcy If we change the defaults, a whole different group of people will complain.  We can add options, but then we might have already.
17:01 johnmark jdarcy: but if another service is already listening on a port, GlusterFS can't just take it over, right?
17:01 johnmark jdarcy: ok
17:01 z00dax fwiw, most files in this gluster store are ~ 2 to 5 MB in size. however disk space used on the bricks ranges from 186GB to 231GB
17:01 johnmark jdarcy: am curious about the "options" possibility
17:02 z00dax i guess there is no reason to expect equal bits everwhere(ish)
17:02 jdarcy johnmark: The problems usually have to do with who gets there first.  If we do, we might break something else.  If something else does, it used to break us but now we "hunt" for an open port in that range.
17:02 johnmark ok
17:03 jdarcy z00dax: How many files?  I'd expect *some* variation, but 25% seems a bit strange for any non-trivial number of files.
17:03 z00dax lets see
17:03 jdarcy johnmark: Of course, the "hunting" slightly increases the possibility that we'll end up sitting on a port someone else wants.
17:04 z00dax would gluster itself have someway to tell how many files there are ?
17:05 jdarcy z00dax: That info does get aggregated through statfs, but not sure if there's a df option for it.
17:05 jdarcy Ahh, df -i
17:05 * jdarcy learned something today.
17:07 z00dax glusterfs#majha:/glusterfs-srv 1081344000  694272 1080649728    1% /srv
17:07 z00dax humm
17:07 z00dax i think were ok on the inodes free side of things
17:08 johnmark z00dax: you think? :)
17:08 jdarcy Seems like 694K files should spread things out pretty well.
17:09 z00dax i can do a rebalance overnight
17:09 jdarcy If the bricks are on their own filesystems, you could do df -i there and see how much variation there is in file counts per brick.
17:09 johnmark jdarcy: semiosis: I updated the BZ - 762989
17:09 semiosis bug 762989
17:09 jdarcy If there's a lot of variation there might be a hashing issue.  If there's not then it's probably just a matter of different file sizes and nothing to worry about.
17:09 glusterbot Bug https://bugzilla.redhat.com​:443/show_bug.cgi?id=762989 low, high, ---, rabhat, ASSIGNED , Possibility of GlusterFS port clashes with reserved ports
17:10 jdarcy Actually if a lot of directories tend to have a large file with the same name, those will all hash the same and you'll get an imbalance.  There was a bug for that.
17:10 z00dax right, so there are lots of dupe filenames all over the fileystem
17:10 z00dax there must be like 50k files called build.log ( guessing on the number, but its going to be high )
17:11 jdarcy Aha.  Yeah, that could be it if they're large.
17:14 jdarcy Also, bug #853258
17:14 glusterbot Bug https://bugzilla.redhat.com​:443/show_bug.cgi?id=853258 unspecified, unspecified, ---, sgowda, MODIFIED , improve dht_fix_layout_of_directory for better rebalance
17:14 Nr18 joined #gluster
17:14 semiosis so that explains the space imbalance, but what about the odd performance?
17:15 semiosis (may explain)
17:15 z00dax the space imbalance isnt really that big a deal, it was more of an observation.
17:15 z00dax the perf is what makes life hard
17:16 kkeithley didn't avati recently merge a patch that moved all the ports above 1024?
17:16 jdarcy It seems slightly possible that they're related. Are your clients also servers, or are these separate sets of machines?
17:17 z00dax jdarcy: different machines
17:18 z00dax the clients in this case are all dell 1950's, the server machines ( cats and majha ) are both super micro opteron 2218's
17:18 johnmark kkeithley: no idea. good to know
17:18 * johnmark searches patches
17:19 jdarcy kkeithley: I just checked on my own machines earlier, and they're using plenty below 1024.
17:19 kkeithley yeah, I don't see it in my tree either. hmmm, what was that
17:20 jdarcy z00dax: Does the performance on the first client rebound if you unmount on the second (so back down to one)?
17:20 z00dax fwiw, i see glusterfs on ports :1003 - :1006
17:20 z00dax jdarcy: lets try
17:21 jdarcy Still a bit mystified TBH.  Nothing comes to mind that would care about another client being present.
17:22 z00dax i can rsync the content away, rebuild from scratch.
17:22 z00dax that might be worth doing anyway, it would allow me to give you access to the machines if you want
17:22 z00dax for now, createrepo is running ..
17:23 kkeithley the change I'm thinking of was on the head of the devel tree, not in released code.
17:24 semiosis jdarcy: locks?
17:25 jdarcy semiosis: Don'.  The second would have to be not only active but accessing the same files at the same time.
17:25 jdarcy semiosis: Don't see how. The second would have to be not only active but accessing the same files at the same time.
17:25 * jdarcy suffers from premature line-termination
17:28 z00dax the other thing is that a 'du -sh' on the backup machine that runs in about 50 odd seconds, takes a very long time on the gluster volume
17:28 z00dax the backup machine is a couple of sata's in an even older machine ( its an opteron 246he based tyan 'storage' machine )
17:28 semiosis enumerating all the files in a directory is one of the slower operations on glusterfs :/
17:32 z00dax semiosis: its still running.
17:32 z00dax 7 minutes and counting
17:33 semiosis if you're enumerating all ~650k files in the volume, that's not surprising... better to use df for that.
17:33 z00dax i think the way to push this further is to get a clean setup, from scratch and see if i can get you guys access to the machines
17:33 z00dax semiosis: this should be about 81k files
17:34 semiosis well idk exactly how long it will take, but unfortunately du (like ls -laR) is going to be slow
17:34 kkeithley Aha, it's in commit 53343d368ae826b98a9eff195e9fcbea148c948f    glusterd-portmap: adhere to IANA standards while assigning brick ports
17:37 z00dax semiosis: is there any other way to find out how much backup space i need to run the rsync :)
17:38 kkeithley bricks used to start at 24007+2, will start at 49152 in a future release
17:39 semiosis z00dax: if you want to backup everything in your volume, df, otherwise idk
17:39 jdarcy I don't suppose there's anything in the logs related to this.
17:39 z00dax ok, inotify tasks setup, hopefully this will finish in tiem for the backup to run through, which in turn i hope finishes before the rebalance kicks off in ~ 4 hrs
17:40 jdarcy Inotify?
17:40 elyograg kkeithley: i hope the port change won't break the ability to do no-downtime server upgrades.
17:40 z00dax jdarcy: i sent the du > /tmp/boo, which is now being watched. whatever ends up in there, will get lvcreated on the backup vol, and kick off the rsync job.
17:41 z00dax in the mean time, i need to go sort out dinner etc ( its almost 7pm here )
17:41 elyograg the fact that you can't upgrade from 3.2 to 3.3 without downtime makes me REALLY glad that I didn't have a volume in production several months ago.
17:41 jdarcy z00dax: OK, we'll keep pondering.  ;)
17:41 z00dax jdarcy: thanks
17:42 Fabiom anybody used 3.3 to host VM storage ? Results ? Comments ?
17:42 kkeithley elyograg: right now I'm not seeing how it would change anything other than which ports you have to open in your iptables firewall, for the people who use that.
17:44 elyograg i want to hurt people that schedule meetings late in the day.
17:44 Psi-Jack Fabiom: Tried 3.something, and it was insanely bad.
17:44 elyograg I was hoping to be gone by 4PM.  Just got a meeting request for 3:30PM ... and it's one I can't blow off.
17:45 y4m4 joined #gluster
17:45 Psi-Jack If you're using anything LESS than 3Gbit connectivity to your storage servers, you shouldn't even bother trying to use GlusterFS for storing VM guest disks to operate/boot off of.
17:47 Fabiom Psi-Jack: Thanks. Good to know.
17:48 Psi-Jack That's just common sense. ;)
17:48 jdarcy What is this "common sense" of which you speak?  Is it truly common?
17:49 Psi-Jack Never seems to be.
17:49 elyograg no tea: dropped.
17:49 semiosis Psi-Jack: 3.something?   i think 3.3 has a lot of improvements over <=3.2 releases specifically for hosting vm images
17:49 semiosis Fabiom: ^
17:49 jdarcy semiosis: Yep.  Granular self-heal is the big one for that use case.
17:49 Psi-Jack semiosis: 3.1 I believe was what I last used. :)
17:50 * jdarcy is working on granular-self-heal issues today.  Turns out that it wasn't so granular in one case.
17:50 Psi-Jack Topped off with the LD wrapper to allow DIRECT_IO emulation.
17:50 Psi-Jack But, still, over 1Gbit, you're only killing yourself. Not improving quality. ;)
17:50 semiosis Fabiom: yes people are using 3.3 to host vm images, but we've not yet heard from any of them :)
17:51 mxs joined #gluster
17:52 Fabiom ok. thought I would ask before I start down the testing path. I got some boxes with SAS drives. I'll see how busy this week is and fire them up and begin reporting in.
17:52 semiosis i mean, not yet since you asked
17:52 Fabiom semiosis: ah ok..
17:52 semiosis stick around, and ask again later
17:52 Psi-Jack Hehe.
17:53 semiosis @O_DIRECT
17:53 glusterbot semiosis: The problem is, fuse doesn't support direct io mode prior to the 3.4 kernel. For applications which insist on using direct-io, see this workaround: https://github.com/avati/liboindirect
17:54 Psi-Jack I just got done perf testing GlusterFS 3.3 on my infrastructure. Not bad for results, really. Considering that the disks themselves are over 3Gbit FC to FC disks in the SAN. The bottleneck I'm seeing is in the 1Gbit network between the 4 systems I'm benching.
17:54 Psi-Jack Prior to 3.4? Interesting. So they fixed that. ;)
17:55 semiosis Fabiom: awesome, let us know how your tests go.
17:55 elyograg more reason to go with fedora over centos.  assuming i'd ever need direct-io, that is.  no idea whether i would.
17:56 Psi-Jack Heh, Fedora makes a really unreliable server. ;)
17:57 Fabiom semiosis: sure.
17:58 elyograg Psi-Jack: the primary reason i'm considering it is so that I can put my right-hand bricks on btrfs and have snapshot support.  recent kernels are required for the latest btrfs improvements and features.  because it'll be a distributed volume, actually using those snapshots for recovery will be painful, but that option would be available.
17:59 Psi-Jack Yeouch!
17:59 Psi-Jack Heh
17:59 jdarcy 3Gbit FC?  I remember 4, and 8, but 3?
17:59 Psi-Jack btrfs has some great idea. :)
18:00 Psi-Jack jdarcy: Ahh yes, probably the 4, then. Heck, for all I know could be 8. I don't touch the actual DC, I just manage our horde of Linux servers. :)
18:02 jdarcy Psi-Jack: It's funny how most of the people we deal with tend to be on the server side.  Storage folks (used to be one myself) tend to stay away.
18:02 Psi-Jack Heh.
18:06 Psi-Jack I know both worlds usually quite well. I know I've got a 1/4th bottleneck going on because of using glusterfs over 1Gbit ethernet instead of 10Gbit. ;)
18:16 avati_ joined #gluster
18:18 duerF joined #gluster
18:22 Technicool joined #gluster
18:23 hattenator joined #gluster
18:39 Psi-Jack Wierd..
18:39 Psi-Jack My glusterfs mount is failing to mount on 3 out of 5 servers suddenly.
18:40 Psi-Jack http://dpaste.org/srpRw/
18:40 glusterbot Title: dpaste.de: Snippet #211736 (at dpaste.org)
18:40 Psi-Jack That make sense to anyone?
18:40 semiosis did you reconfigure a cache option?
18:41 Psi-Jack I did, yes.
18:41 semiosis looks like you gave a value too large
18:41 Psi-Jack Hmmmm. That's used on the clients? Holy crap.
18:41 semiosis imho it's good to leave options at default unless you have good reason to change them
18:43 Psi-Jack I thought those were server-side settings,but apparently not.
18:44 semiosis http://c2.com/cgi/wiki?RulesOfOptimization
18:44 glusterbot Title: Rules Of Optimization (at c2.com)
18:46 raghu joined #gluster
18:47 madphoenix joined #gluster
18:51 dbruhn joined #gluster
18:51 dbruhn Is there documentation on upgrading from 3.3.0-1 to 3.3.1?
18:59 balunasj joined #gluster
19:00 kkeithley not per se, no. I have the impression that most people are just doing rolling upgrades.
19:02 aliguori joined #gluster
19:03 JoeJulian If you're insane like me and run 60 bricks per server, make sure you upgrade the server rpm with --noscripts. Restarting 60 bricks simultaneously sucks.
19:04 balunasj joined #gluster
19:08 Teknix joined #gluster
19:15 lowtax left #gluster
19:15 balunasj|mtg joined #gluster
19:25 balunasj joined #gluster
19:34 jdarcy Huh.  Repair rate suddenly dropped from 36MB/s to 20MB/s, stayed there for a few minutes, than back up to 45MB/s.  No obvious reason.
19:40 H__ chunk of small files ?
19:41 jdarcy No, this was all for one big 32GB file.
19:42 jdarcy Just goes to show the importance of having software that averages this kind of glitchiness across many disks.  Single disks do weird stuff like this.
20:03 avati_ joined #gluster
20:14 UnixDev joined #gluster
20:14 UnixDev hi all, is there a way to validate a volume with replicas to make sure they are in sync?
20:18 avati_ joined #gluster
20:28 Bullardo joined #gluster
20:42 semiosis UnixDev: several ways
20:43 semiosis you can scan files looking for "dirty" xattrs
20:43 semiosis @scan script
20:43 semiosis hmm
20:43 semiosis @check script
20:43 semiosis @script
20:43 glusterbot semiosis: I do not know about 'script', but I do know about these similar topics: 'stripe'
20:43 semiosis there was a factoid for that but i dont remmeber it
20:44 semiosis UnixDev: you can also use rsync -n (no-op) to compare bricks
20:45 UnixDev I see… seems like a situation I don't want to be in.. it became this way when we were trying some failure scenarios.. there is no in-gluster way for the consistency to be fixed?
20:51 tryggvil joined #gluster
20:52 UnixDev semiosis: I tried mounting through use and doing the stat trick to all files, i see some that say "cannot stat"  Input/output error
20:52 UnixDev this happens in both hosts… is that normal?
20:52 UnixDev use=fuse*
20:55 semiosis well you asked about checking, so thats what i was giving you
20:55 semiosis but yeah, fixing is done in glusterfs, for sure
20:56 semiosis pre 3.3 you had to do a ,,(repair) otherwise files would only be healed when next accessed (through the client)
20:56 glusterbot http://www.gluster.org/community/do​cumentation/index.php/Gluster_3.1:_​Triggering_Self-Heal_on_Replicate
20:56 semiosis but now since 3.3.0 there's a self-heal daemon which should take care of that pro-actively
20:56 semiosis UnixDev: cannot stat?  gotta check client log file to see why
20:56 semiosis feel free to pastie the log
20:57 semiosis bbiab
20:57 UnixDev semiosis: I am running 3.3.0 git, self heal is running but I think its failing
20:58 semiosis pastie client logs
21:05 aliguori joined #gluster
21:08 gbrand__ joined #gluster
21:09 gbrand_ joined #gluster
21:15 Gilbs1 joined #gluster
21:16 Gilbs1 Hey gang, quick question:  Does gluster support secure deletion?
21:24 jbrooks joined #gluster
21:31 balunasj|mtg joined #gluster
21:33 UnixDev semiosis: http://pastebin.com/5PWuRCrz
21:33 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
21:34 semiosis UnixDev: split brain huh
21:34 semiosis @split-brain
21:34 glusterbot semiosis: (#1) learn how to cause split-brain here: http://goo.gl/nywzC, or (#2) To heal split-brain in 3.3, see http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ .
21:35 UnixDev yes, this is worrying me as this is only lab testing.. but it could be really bad in production
21:35 semiosis UnixDev: split brain can be avoided... does your testing equate to either of the two scenarios described in link #1?
21:36 semiosis and by "can be avoided" i mean "should be avoided by design"
21:37 UnixDev semiosis: see, but this is the issue, it never really happened, only 1 machine ever had the ip, so really only 1 could have gotten the writes
21:37 UnixDev so scenario is as follows
21:37 semiosis no
21:37 semiosis but please describe further :)
21:38 UnixDev vol1 with 2 replicas on node1 and node2
21:38 UnixDev virtual ip only lives on 1, master assigned to node1
21:38 UnixDev client mounts volt from node1 virt ip… writes happen
21:39 UnixDev node1 dies, ip moves to node 2… writes continue
21:39 semiosis nfs or fuse client?
21:39 UnixDev nfs
21:39 semiosis ok thx, go on please
21:39 UnixDev now when node1 comes back, is when split brain happens… node1 comes back, ip moves back to node 1, writes start happening
21:39 UnixDev then poof, split-brain
21:40 semiosis interesting
21:40 * semiosis ponders
21:41 semiosis please ,,(pasteinfo)
21:41 glusterbot Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
21:41 semiosis and also your client mount command
21:41 UnixDev I think the problem comes in when node 1 comes back, it needs time to sync vols before more writes happen...
21:41 UnixDev one sec
21:41 UnixDev mounting is done by vmware, don't have command
21:41 UnixDev getting vol info
21:42 semiosis then just the nfs server/path used, if not the whole mount command
21:42 UnixDev http://fpaste.org/geAK/
21:42 glusterbot Title: Viewing Paste #246296 (at fpaste.org)
21:42 UnixDev mount is 192.168.133.21 (virt ip, primary node1)
21:43 UnixDev 192.168.133.21:/vol2
21:43 semiosis wow a lot of options reconfigured
21:44 UnixDev performance tweaking :P
21:44 semiosis UnixDev: just on a hunch or did you really measure things?
21:45 UnixDev lots of measuring
21:45 semiosis wow
21:45 semiosis impressed
21:46 UnixDev now, I have the performance part working to my liking… issue is really these failure sencarios
21:46 semiosis how much improvement did you get with this config vs. defaults?
21:46 semiosis (digressing, i know, but so curious!)
21:47 UnixDev well, 10x at least difference… try testing random 4k with fio
21:47 semiosis awesome
21:47 UnixDev I will tell though, there are some other tricks too
21:47 UnixDev https://github.com/facebook/flashcache
21:47 glusterbot Title: facebook/flashcache · GitHub (at github.com)
21:47 UnixDev AWESOME.. i highly recommend that
21:48 semiosis nice.  people come in here all the time asking about performance tuning but they rarely take the time to actually measure & tune
21:48 semiosis this is great feedback
21:48 UnixDev semiosis: I'm stuck now trying to validate the possible failure scenarios, imaging a split-brain on a vol with 2k virtual machines
21:49 semiosis right right
21:49 UnixDev semiosis: I have done much measuring and tuning… and so far, that part I am very happy with
21:50 semiosis it would be really nice if you could write about the measurements & tuning publicly, that's something really lacking in our documentation
21:50 semiosis just wanted to put that out there :)
21:50 semiosis now back to the split brain issue
21:53 semiosis so if you're using nfs clients, was that the nfs.log you pastied earlier?
21:54 semiosis seems to be
21:55 UnixDev yes, that was tail of log on node1 which is mount point for clients
21:55 UnixDev nfs.log*
21:56 aliguori joined #gluster
21:58 semiosis well i have a theory but it's just a wild guess... that the VIP is moving back to the restored node1 before the gluster-nfs server is connected to node2 (possibly before it's even started)
21:59 semiosis this presupposes that the gluster-nfs daemon begins serving client requests before it has connected to all bricks... which is somewhat reasonable but totally unverified by me
22:00 Gilbs1 Is it normal for gluster to take about 10-15 seconds to open a directory with 300+ folders.  Using samba share, via 2k8 windows server.
22:01 UnixDev semiosis: this is possible, but glusterd on node1 was never stopped.. to simulate network failure, network was stopped… in theory a write could have come in before node2 was reconnected after network was started
22:01 UnixDev Gilbs1: I've had issues like this with win 2k8, there is some setting for cifs share version or something, set it to the older one
22:02 Gilbs1 Unixdev:  Where is that setting on windows or on the smb.conf?
22:03 semiosis UnixDev: when you say network was stopped... you mean all interfaces?  what i would propose would be letting the .1x interface come up before enabling the VIP .2x, since gluster-nfs will communicate with the bricks using the .1x interface
22:04 semiosis UnixDev: that way it can establish nfs-server-to-brick connections before the VIP arrives with all its client traffic
22:05 imanelephant We're noticing an issue where rsync reading from a glusterfs vol causes gluster to crash hard. Before we dive too deeply into troubleshooting, I wanted to ask if anyone had thoughts on this
22:05 imanelephant found an article in google, but they were using rsync to write to the glusterfs, not read from it
22:08 Daxxial_ joined #gluster
22:09 UnixDev semiosis: yes, that is what happened
22:09 JoeJulian I use rsync to read from my volumes to do backups. No problem.
22:09 JoeJulian imanelephant: What version
22:10 UnixDev Gilbs1: try this, http://www.petri.co.il/how-to-disable-sm​b-2-on-windows-vista-or-server-2008.htm
22:10 glusterbot Title: How to Disable SMB 2.0 on Windows Vista/2008 (at www.petri.co.il)
22:11 imanelephant 3.2.5 running on Ubu 12.04
22:11 Gilbs1 Thanks, i'll give it a go.
22:11 JoeJulian imanelephant: Also can you fpaste the crashdump at the end of the log file, and can you pull a 'thread all apply bt' to the core?
22:12 JoeJulian 3.2.5 is old and relatively buggy. You should at least be running 3.2.7 if you're going to continue running the 3.2 series.
22:14 UnixDev semiosis: network service was stopped from console to simulate failure. after a few minutes, network was started again. first the gluster facing ip was started on bond0.. then came the virtual ip is put on by heartbeat. ( run down of events that caused split-brain from node1 prospective )
22:14 imanelephant JoeJulian: thanks, 3.2.5 is what is packaged with Ubu 12.04, of course. Can you point me to doc on how best to upgrade it to 3.2.7 ?
22:16 JoeJulian @ppa
22:16 glusterbot JoeJulian: The official glusterfs 3.3 packages for Ubuntu are available here: https://launchpad.net/~semiosis​/+archive/ubuntu-glusterfs-3.3
22:17 JoeJulian Either that or ,,(latest) might be good now. Not sure if this has the 3.2 branch though.
22:17 glusterbot The latest version is available at http://download.gluster.org/p​ub/gluster/glusterfs/LATEST/ . There is a .repo file for yum or see @ppa for ubuntu.
22:17 imanelephant thanks. would also be interested in knowing more about quick-read vs io-cache
22:20 JoeJulian The basic information here is still accurate in general terms: http://www.gluster.org/community/documentation​/index.php/Translators/performance/quick-read
22:21 duerF joined #gluster
22:35 Technicool joined #gluster
22:42 Nr18 joined #gluster
22:45 UnixDev semiosis: after a fallback, will gluster sync the changes written to node2 back to node1 auto-matically ? and if so, how can this process be tracked/monitored?
22:46 sashko joined #gluster
22:46 semiosis back
22:47 samppah joined #gluster
22:49 semiosis forest: sadly there's no super easy upgrade path to 3.2.7
22:50 forest ah thanks for the honesty  ;-)
22:50 semiosis forest: are you just starting out with glusterfs?
22:50 forest is there a package we can use to start our journey into dependency hell, er I mean dependency management?
22:51 forest or should we build from csource?
22:51 forest yes
22:51 forest I've used it on one project before
22:51 semiosis what i'm getting at is, could you just start fresh instead of upgrading?
22:51 forest however I worked with someone from RedHat who set it up
22:51 forest yes
22:51 semiosis then use the ,,(ppa) which has 3.3.1
22:51 glusterbot The official glusterfs 3.3 packages for Ubuntu are available here: https://launchpad.net/~semiosis​/+archive/ubuntu-glusterfs-3.3
22:51 forest sounds preferable, even
22:52 semiosis and if you do want to upgrade, use that ppa and follow ,,(3.3 upgrade notes)
22:52 glusterbot http://vbellur.wordpress.com/2012/​05/31/upgrading-to-glusterfs-3-3/
22:52 forest do you know people running this on 12.04 and can confirm it is a stable build?
22:52 semiosis but it would probably be easier to just start fresh
22:54 semiosis glusterbot says it's official
22:56 sashko left #gluster
23:10 Gilbs1 left #gluster
23:19 duerF joined #gluster
23:35 forest joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary