Camelia, the Perl 6 bug

IRC log for #gluster, 2013-07-04

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:18 vpshastry joined #gluster
00:50 kedmison joined #gluster
00:51 bala joined #gluster
00:57 joelwallis joined #gluster
01:18 __Bryan__ joined #gluster
01:31 kedmison I'm looking for some help with a problem I've had with replace-brick on Gluster 3.3.1.  replace-brick hung, so I restarted the target gluster server and am now in a state where aborting the replace-brick says my source brick does not exist in my volume.
01:32 kedmison but when I try to add a brick to the volume, I get a message saying that a replace-brick is in progress and to retry after replace-brick operation is committed or aborted.
01:33 kedmison Does anyone have any ideas how to get the replace-brick aborted so I can add a brick to the volume?
01:39 recidive joined #gluster
01:54 badone joined #gluster
02:04 zwu joined #gluster
02:09 harish joined #gluster
02:09 atrius_ joined #gluster
02:32 bala joined #gluster
02:35 recidive joined #gluster
02:37 harish joined #gluster
02:38 atrius_ joined #gluster
02:39 bharata-rao joined #gluster
02:52 kshlm joined #gluster
03:01 vshankar joined #gluster
03:05 badone joined #gluster
03:25 aknapp joined #gluster
03:26 mohankumar joined #gluster
03:43 raghu joined #gluster
03:44 badone joined #gluster
03:52 hagarth joined #gluster
03:59 JoeJulian kedmison: "replace-brick ... commit force"
04:02 kedmison the replace-brick hasn't succeeded in copying all the data; will a commit-force drop the data from the source brick?  My cluster is a distribute, not replicated, cluster so I have to be careful about data loss.
04:17 badone joined #gluster
04:25 sgowda joined #gluster
04:36 shylesh joined #gluster
04:40 vpshastry joined #gluster
04:48 _pol joined #gluster
04:50 CheRi_ joined #gluster
04:53 Humble joined #gluster
05:16 satheesh joined #gluster
05:42 bulde joined #gluster
05:45 psharma joined #gluster
05:45 ppai joined #gluster
05:45 guigui1 joined #gluster
05:48 anands joined #gluster
05:53 satheesh joined #gluster
05:54 kevein joined #gluster
05:56 ramkrsna joined #gluster
05:56 ramkrsna joined #gluster
05:58 satheesh joined #gluster
06:06 jclift_ hagarth: build.gluster.org doesn't get patches?
06:08 hagarth jclift_: jenkins on build.gluster.org pulls patches from gerrit.
06:09 jclift_ hagarth: Bad wording on my part.  I meant OS patches. :D
06:11 jclift_ Meh, ignore this.  I'll figure it out, etc. :D
06:14 rgustafs joined #gluster
06:20 ngoswami joined #gluster
06:20 GrTheo joined #gluster
06:21 shireesh joined #gluster
06:21 jtux joined #gluster
06:22 GrTheo need help with a failed gluster in replication mode after a cable failure
06:22 GrTheo how Do I rebalance the bricks ??
06:25 jclift_ GrTheo: I'm pretty sure there's a command for doing that, which shows up if you do "gluster volume help"
06:25 jclift_ GrTheo: But, I've never had to use the recovery commands before, so I'm not able to give any useful advice here :(
06:26 * jclift_ has a dev/test setup that doesn't need recovery
06:26 jclift_ (like, ever :>)
06:26 GrTheo Did that ... but something is wrong ...
06:27 GrTheo cannot tell what
06:27 jclift_ semiosis: Got a sec? ^^^
06:28 bala joined #gluster
06:29 GrTheo what is that?? doing an ls on a mounted gluster volume .... ls: reading directory .: Transport endpoint is not connected
06:29 jclift_ Gah, I get that fairly often too when I've been breaking gluster (in dev builds tho)
06:30 jclift_ I *think* it means Gluster has lost the connection between the client and the server.
06:30 jclift_ Try to unmount it, then remount it
06:30 GrTheo ok
06:30 jclift_ I know, sounds stupid, but it might work
06:30 hagarth jclift_: that is right. how about adding to a FAQ page somewhere?
06:31 * jclift_ adds a note to create an FAQ entry about it.
06:31 * jclift_ adds another note, to create an FAQ page while I'm at it
06:31 hagarth jclift_: gracias!
06:32 GrTheo mean while still don't know how to handle this "unknown option _netdev "
06:33 FilipeMaia joined #gluster
06:34 hagarth GrTheo: you can ignore that if your mount is happening.
06:35 jclift_ ^^ This sounds like another potential FAQ point too
06:35 GrTheo getting an ls on the mounted volume is veeeeeeryyyyy slooooooow ....
06:35 jclift_ Tonnes of files in the directory?
06:36 GrTheo volume contains 5 years mail maildir files dovecot is operating on the other end
06:38 jclift_ GrTheo: Is dovecot setup to save each message in an individual file, or is it just a few files per user?
06:38 GrTheo maildir setup is each mail on an individual file .
06:39 jclift_ Asking because Gluster's most non-optimal scenario is using it with tonnes of small files.  eg generally recommended to NOT use Gluster for news servers, and email servers with 1 file per message
06:39 jclift_ Well then... :/
06:40 jclift_ Technically it'll work... but it's not going to be fast.  Or even "medium" really. :(
06:40 GrTheo how can I find out what is really going on ... ??? gluster volume ???
06:41 jclift_ GrTheo: That being said, Gluster 3.4 series (still in dev) is faster than 3.3
06:42 jclift_ GrTheo: I really don't know.  No familiarity at all yet with the gluster commands for rebalancing, healing, or anything to do with it
06:42 jclift_ Hopefully someone else is around and has time to help
06:43 GrTheo tcpdump on the gluster interface does not show a big deal of packets  traveling
06:45 mooperd joined #gluster
06:47 GrTheo volume status gives operation failed .... anybody ????
06:47 ndevos @learn _netdev as The mount-option "_netdev" is checked on RHEL based distributions in /etc/rc.sysinit and /etc/init.d/netfs, in Fedora systemd handles it. Older versions of /sbin/mount.glusterfs warn that it ignores the "_netdev" option, which is what it should do, so this warning has been silenced with bug 827121 (in glusterfs-3.4).
06:47 glusterbot ndevos: The operation succeeded.
06:49 hagarth GrTheo: are any of your ports blocked by firewall rules?
06:49 GrTheo there is no FW btw the nodes
06:50 ollivera joined #gluster
06:58 ctria joined #gluster
07:00 ekuric joined #gluster
07:01 deepakcs joined #gluster
07:07 jtux joined #gluster
07:10 ricky-ticky joined #gluster
07:13 mooperd joined #gluster
07:24 andreask joined #gluster
07:36 dobber_ joined #gluster
07:44 ProT-0-TypE joined #gluster
07:51 asriram joined #gluster
08:06 satheesh1 joined #gluster
08:20 RobertLaptop joined #gluster
08:21 fidevo joined #gluster
08:27 vimal joined #gluster
08:28 rastar joined #gluster
08:29 CheRi_ joined #gluster
08:32 dobber___ joined #gluster
08:36 ujjain joined #gluster
08:37 FilipeMaia joined #gluster
08:40 badone joined #gluster
08:51 harish joined #gluster
09:10 mooperd joined #gluster
09:22 ccha about nfs and ucarp, ucarp check only if server if up or down, what about check if gluster port ?
09:25 efries_ joined #gluster
09:25 satheesh joined #gluster
09:25 ninkotech__ joined #gluster
09:27 ProT-O-TypE joined #gluster
09:27 Peanut_ joined #gluster
09:27 johnmark_ joined #gluster
09:27 twx_ joined #gluster
09:27 _Dave2_ joined #gluster
09:31 badone joined #gluster
09:32 saurabh joined #gluster
09:32 Humble joined #gluster
09:32 krokarion joined #gluster
09:33 mriv joined #gluster
09:33 Humble joined #gluster
09:36 Humble joined #gluster
09:37 a2 joined #gluster
09:38 CheRi_ joined #gluster
09:38 ctria joined #gluster
09:39 rwheeler joined #gluster
09:39 hagarth joined #gluster
09:39 glusterbot New news from newglusterbugs: [Bug 981221] NFS is picking up geo-rep's already open (read-only) file descriptor as an anonymous FD <http://goo.gl/kw27N>
09:41 duerF joined #gluster
10:00 darshan joined #gluster
10:11 spider_fingers joined #gluster
10:15 GrTheo joined #gluster
10:16 GrTheo could someone indicate  a  reliable split-brain  recovery HOWTO ?? a URL maybe ??
10:21 GrTheo I think I am trapped in an a software that it seems to be unreliable
10:22 CheRi_ joined #gluster
10:26 darshan joined #gluster
10:32 andreask GrTheo: this may help: http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/
10:32 glusterbot <http://goo.gl/FPFUX> (at joejulian.name)
10:45 vimal joined #gluster
10:46 edward1 joined #gluster
11:03 tziOm joined #gluster
11:08 ccha joined #gluster
11:10 rcheleguini joined #gluster
11:12 rgustafs joined #gluster
11:16 GrTheo afte gluster heal vol info I getting a bunch of gfid:345c412e-0c49-48a1-bf8a-f7af28556aa7 what are these and how to resolve them
11:17 GrTheo ??
11:19 CheRi_ joined #gluster
11:27 Alpinist joined #gluster
11:37 spider_fingers joined #gluster
11:37 spider_fingers left #gluster
11:41 manik joined #gluster
11:43 andreask joined #gluster
11:45 joelwallis joined #gluster
11:50 GrTheo sorry guys ... this thing is a fail and a drop for me ... nevertheless I'll try my best to inform other people for the issues I've found out ... what can I say I am speechless
11:50 GrTheo SOFTWARE FROM INDIA ???
11:54 GrTheo Oh God ... this sucks and the all that surrounds it
11:54 GrTheo left #gluster
11:56 samppah what
11:59 stickyboy Is there any way to mount a volume as read only with the FUSE client?  Gluster 3.3.1.
12:01 ndevos stickyboy: that is bug 853895 (recent kernels) and bug 980770 (for older kernels)
12:01 glusterbot Bug http://goo.gl/xCkfr medium, medium, ---, csaba, ON_QA , CLI: read only glusterfs mount fails
12:01 glusterbot Bug http://goo.gl/nTFRU medium, unspecified, ---, ndevos, MODIFIED , GlusterFS native client fails to mount a volume read-only
12:01 stickyboy ndevos: Okie doke.  Thanks.
12:22 manik joined #gluster
12:23 hagarth joined #gluster
12:50 sprachgenerator joined #gluster
12:51 recidive joined #gluster
12:51 ccha with gluster nfs + ucarp, what about server is still up but something wrong with glusterfson this server ?
12:58 samppah good question
12:58 samppah afaik there's no way for ucarp to see if something is wrong
13:05 sprachgenerator left #gluster
13:08 sprachgenerator joined #gluster
13:15 deepakcs joined #gluster
13:30 joelwallis joined #gluster
13:38 puebele joined #gluster
13:39 failshell joined #gluster
13:40 failshell joined #gluster
13:44 sprachgenerator joined #gluster
13:50 joelwallis joined #gluster
13:51 kedmison joined #gluster
13:52 hybrid512 joined #gluster
13:53 asriram joined #gluster
13:55 andreask joined #gluster
13:57 Humble joined #gluster
14:02 mohankumar joined #gluster
14:36 ctria joined #gluster
14:37 jag3773 joined #gluster
14:59 spider_fingers joined #gluster
14:59 spider_fingers left #gluster
15:08 sprachgenerator joined #gluster
15:11 18VABEU87 joined #gluster
15:11 manik1 joined #gluster
15:15 swaT30 joined #gluster
15:18 sprachgenerator left #gluster
15:20 kedmison joined #gluster
15:37 sprachgenerator joined #gluster
15:41 jclift_ joined #gluster
15:55 ctria joined #gluster
15:56 Guest43193 Hmm... After changing the hostname on one of my servers, it can no longer connect to the glusterfs-server instance running on itself. Logfiles tells me that IP address is resolved correctly, there's just no response on the port. :|
15:59 aknapp joined #gluster
16:01 jclift_ Guest43193: Any chance you have an entry for the old hostname in /etc/hosts?
16:01 Guest43193 I don't have any hostname entries in /etc/hosts
16:02 Guest43193 (Except localhost obviously)
16:02 jclift_ Oh well, was a thought.  I get exactly that behaviour on my test lab boxes when I change their IP address and forget to update the /etc/hosts. :)
16:02 CheRi_ joined #gluster
16:03 Guest43193 "gluster peer status" shows everything looking fine.
16:03 jclift_ Guest43193: If you start Glusterd up in --debug mode, does anything show up?
16:04 jclift_ Guest43193: If it helps, I normally cheat when doing the "start glusterd up in debug mode".
16:05 Guest43193 Hmm...
16:05 jclift_ Guest43193: I normally do a "ps -ef|grep gluster > gluster.args", then look at that file for the arguments to launch it with, adding "--debug" mode to them
16:05 jclift_ Guest43193: That keeps it in the foreground, and running as root
16:06 jclift_ Guest43193: From memory, whenever I have the hostname != IP address problem with my boxes, gluster doesn't start up properly anyway.
16:06 * jclift_ doesn't remember what the error says though
16:06 jclift_ But, it's hopefully something useful
16:10 Guest43193 The hostname has always pointed to the right IP address though.
16:10 Guest43193 It didn't change its IP address.
16:11 Guest43193 Doing the --debug thing now, not really seeing anything glaringly obvious.
16:12 Guest43193 glusterd just says EOF from peer.
16:13 robo joined #gluster
16:13 jclift_ Ok, lets just check a super basic thing, so we're 100% absolutely sure...
16:13 jclift_ On the gluster host box, what's the result of "hostname --fqdn" ?
16:14 jclift_ Whatever the result is, cut-n-paste the full thing onto a ping command and make sure the resolved address matches.  Just to 100% make sure the host is actually seeing itself.
16:15 jclift_ You could be 100% correct.  It's just sounding a lot like the behaviour I get, so just saying... :>
16:15 Guest43193 Tried it already. And tried it again now.
16:15 jclift_ Damn
16:16 Guest43193 But it's worth noting that I don't connect to this hostname, but rather a hostname called p.hostname for the private IP of the server.
16:16 Guest43193 This worked just fine with the old hostname however, and the p. domain also resolves with ping.
16:19 jclift_ I'm not sure.  I know Gluster has a weird way of treating hostname/ip address.  You can really only use one IP address effectively with gluster on a box.  (hopefully to be fixed in gluster 3.5)
16:19 jclift_ Guest43193: As a way to check, can you change the hostname of the server to match the p.hostname one, reboot, and try starting gluster again?
16:19 jclift_ It sounds like that might get it working.
16:20 Guest43193 Aint got anything better to try for now. :)
16:20 jclift_ Guest43193: Oh hang on, have you done "netstat -nltp" to check if glusterd is actually still running and if so see which IP it's binding to?
16:20 ctria joined #gluster
16:21 jclift_ Run the netstat -nltp as root
16:21 jclift_ It should spit out the list of programs listening on TCP ports
16:21 Guest43193 Will do when the system gets up again.
16:21 jclift_ :)
16:22 jclift_ It should have the IP addresses each thing is listening on, the port(s), and the name of the program
16:22 Guest43193 Though it seems to answer according to the server logs. I see it responds to a mount /path command when running glusterd --debug so something must be getting through.
16:22 jclift_ yeah, I'm kind of suspecting glusterd might be starting fine, but is only listening on the IP address for the old hostname
16:23 Guest43193 Which is the same IP address. :)
16:24 jclift_ k, my bad wording there
16:24 jclift_ So I kind of suspect it's starting up fine, but not listening on the IP for p.hostname.  p.hostname is a different IP isn't it?
16:24 Guest43193 This is kind of a non-critical problem for me, I *could* simply reinstall the server, set up glusterfs again, and it would all magically work. This isn't a production machine. But I'd really like to know why it happened, because we are going to start using glusterfs in production very soon.
16:25 jclift_ Sure.  Better to figure it out and have the sense of confidence.  Less "magic" involved. :D
16:27 Guest43193 glusterd is listening on 0.0.0.0:24007
16:28 Guest43193 Which seems correct.
16:28 jclift_ Interesting
16:28 jclift_ What's the output of "gluster peer status" say?
16:29 Guest43193 It says all is okay.
16:29 Guest43193 Its connected with the other server (That is also experiencing the same issue)
16:29 Guest43193 When I try to mount with nfs, it says no such file or directory...
16:30 Guest43193 But gluster volume lists the volume as available and online.
16:30 jclift_ Hmmm.  With that output from gluster peer status, is it using IP addresses or hostnames for the boxes?
16:30 Guest43193 Hostnames.
16:30 jclift_ Are they the p.hostnames ?
16:31 Guest43193 Yes.
16:31 jclift_ Damn.  I have no idea what's wrong then.
16:31 jclift_ Hopefully someone else is around, who knows how to look into this better. :)
16:32 Guest43193 A crazy guess from me: The hostname becomes part of the volume name in some not-indicated-by-logs way, so when glusterfs-fuse or nfs tries to mount it, they specify the new hostname, but glusterd thinks its the old hostname. Or vice verca.
16:33 Guest43193 I think there's some internal identification that bugs out.
16:33 jclift_ Hmmmm, interesting thought
16:33 jclift_ Do both a find | grep "hostname" and a grep -r "hostname" on /var/lib/glusterd/
16:35 Guest43193 Done that, I changed all the hostname entries in /var/lib/glusterd when I changed the hostname.
16:36 jclift_ Maybe make a backup of /var/lib/glusterd/ then, and re-install gluster?
16:36 bulde joined #gluster
16:36 jclift_ If the new installation works, then compare the differences in /var/lib/glusterd/ so you know what went wrong. :D
16:37 _pol joined #gluster
16:40 GabrieleV joined #gluster
16:42 sprachgenerator left #gluster
16:42 FilipeMaia joined #gluster
16:43 sprachgenerator joined #gluster
16:43 sprachgenerator left #gluster
16:44 vpshastry joined #gluster
16:46 Guest43193 https://bugzilla.redhat.com/show_bug.cgi?id=765380 <- Wonder if this is related.
16:46 glusterbot <http://goo.gl/p9EKP> (at bugzilla.redhat.com)
16:46 glusterbot Bug 765380: medium, medium, ---, vshankar, POST , [glusterfs-3.3.0qa11]: pathinfo xattr using hostname causes problems for machines with same hostname
16:58 phox joined #gluster
16:59 phox so I've been having very occasional errors reading from glusterfs
16:59 phox and this is logged to go with:  W [page.c:984:__ioc_page_error] 0-projects-hydrology-scratch0-io-cache: page error for page = 0x7f51fbf6f5f0 & waitq = 0x7f51fbf7fb20
16:59 phox also getting a lot of -EBADFD crap around there
17:03 duerF joined #gluster
17:08 Guest43193 jclift_: Changing all the hostname entries in the volume files back made the volume accessible again.
17:08 jclift_ Interesting
17:14 * Guest43193 tests a pebkac hypothesis.
17:14 Debolaz joined #gluster
17:17 Debolaz jclift_: Pebkac indeed. Forgot to rename a file hidden away in a directory. :)
17:19 phox who goes around hiding files in directories?
17:19 phox I much prefer free range files, myself.
17:21 jclift_ Debolaz: Cool. :)
17:22 jclift_ phox: "Organic", free range files taste better. :D
17:30 phox I prefer to keep carbon _off_ my rotating rust =/
17:30 phox tends to short the heads out
17:31 mooperd joined #gluster
17:36 phox jclift_: Just say no to GMR files! :D
17:37 * phox petitions the NSA
17:37 phox ... they're like the FDA, but for data, right?
17:37 jclift_ Heh
17:37 jclift_ I wonder if when a US victim^H^H^H^H^H^Hcitizen loses their data, if they can request it back from the NSA's copy?
17:38 * jclift_ gets back to work
17:38 failshell lol
17:41 glusterbot New news from newglusterbugs: [Bug 961892] Compilation chain isn't honouring CFLAGS environment variable <http://goo.gl/xy5LX>
17:47 failshell im loading our images repository on RHS. millions of files. its been going for days.
17:57 Humble joined #gluster
17:59 vpshastry2 joined #gluster
18:08 mooperd joined #gluster
18:09 jclift_ Wonder if there could be a way invented, to do an initial bulk load files into Gluster.
18:09 phox jclift_: :D n1
18:09 phox jclift_: although in exactly that same vein...
18:09 phox jclift_: I recently recovered some stuff on my Wintendo I didn't know was being backed up... which is fscking creepy
18:10 phox "oh look Windows has been secretly keeping versions of my data around"... useful in this case, but I expect to be notified of that
18:10 jclift_ Kind of like how with SQL databases if a person loads everything one transaction at a time, it takes _ages_.  But an initial bulk load (not firing triggers, not in separate transactions) can be very quick.
18:10 phox jclift_: yeah you can put stuff on the brick first
18:10 phox doesn't work if you're using striping, of course
18:10 jclift_ Good point
18:11 phox then Gluster needs to do its thing... but if you're just using a single brick or mirroring or something it has no complaints about that
18:11 jclift_ The "putting stuff on the brick" doesn't set xattrs on the files tho does it?
18:11 phox no but I'd assume when you bring the brick up it does
18:11 phox and/or on access?
18:11 jclift_ No idea.  I haven't looked at this stuff yet.
18:11 phox I don't know that it's officially supported at all... but yeah I've done that here
18:12 jclift_ I seem to be walking around some of the edges of Gluster stuff first, trying to knock off the rough bits and make it more accessible for people to develop on.
18:14 jclift_ So, an ideal bulk load tool would take the input data and load it onto the nodes, creating xattr's on the way, and NOT bothering to do any stat()'s between nodes on the way.
18:14 jclift_ Said ideal tool would also support striping.
18:14 jclift_ After it's done, *then* gluster can be started on the nodes.
18:14 * jclift_ goes and writes up a BZ, requesting someone to write said tool
18:20 failshell i thought about that, rsync'ing directly on one of the bricks
18:20 failshell instead of through a mounted volume
18:20 failshell but didnt know if gluster would replicate to the other brick
18:24 jclift_ failshell: Next time, maybe try it out on a small subset (first), and see what happens?
18:24 failshell yeah will do that
18:24 failshell will actually test it now on a test volume
18:24 failshell im curious
18:25 jclift_ Good man.  Pls report back here. (so I can learn from your testing :>)
18:36 htrmeira joined #gluster
18:41 * phox needs to write some patches for mksquashfs
18:41 mooperd joined #gluster
18:41 * phox wonders who writes a tool whose (non-reconfigurable default) is to IGNORE it being unable to read a file
18:41 phox file under "derp"
18:42 phox failshell / jclift_: it'd make sense for gluster to use inotify on the brick directly but I doubt it does
18:42 phox but it'd be a clever facility for high-efficiency things like that...
18:43 jclift_ phox: I looked into that some time ago too, thinking it would be a good way to get rid of the stat() calls.  It turns out inotify isn't up to the task though, as it's only "per directory" notifications. :(
18:44 jclift_ phox: However, there's some work going on around something similar, for the geo-replication improvements.
18:44 jclift_ Doug Williams is right into that stuff, but I don't remember the details.
18:44 jclift_ phox: If there was a way to do "notifications" for a complete filesystem, that'd be useful.
18:54 jclift_ phox failshell: https://bugzilla.redhat.com/show_bug.cgi?id=981456
18:54 glusterbot <http://goo.gl/d1AFm> (at bugzilla.redhat.com)
18:54 glusterbot Bug 981456: medium, unspecified, ---, amarts, NEW , RFE: Please create an "initial offline bulk load" tool for data, for GlusterFS
18:54 failshell ah so loading it on both bricks should work
18:54 failshell i gotta try that
18:58 jclift_ Cool.  Got the Gluster Test Framework running through successfully in a CentOS 6.3 VM.
18:58 jclift_ Now to see how well it works with newer stuff, then figure out why it's not working on my actual hardware box.
18:58 failshell email shitstorm taken care of. let's get testin!
18:58 jclift_ :)
18:59 failshell another feature that would be cool would be to list the currently connected clients
19:05 jclift_ failshell: Would you be ok to create a BZ ticket, asking for that?
19:06 jclift_ failshell: Something like "For NFS, on a server we can tell which clients are connected by doing <whatever>.  We need something similar for gluster native server mounts."
19:06 Savaticus joined #gluster
19:06 failshell BZ?
19:06 failshell ah bugzilla
19:06 jclift_ Yeah, sorry.
19:06 jclift_ Red Hat lingo, as we use it so often :D
19:07 18VABEU87 I wonder ( I don't even expect an answer ) when a node breaks down  ( network or software failure  ) ... what is the procedure to resync the bricks ???
19:08 jclift_ 18VABEU87: There is a "gluster volume heal" command that I *think* is used for that.
19:08 jclift_ 18VABEU87: That being said, I do stuff in me dev/test lab here, so haven't yet had to actually use any of that healing type of stuff so far
19:09 * jclift_ isn't far into Gluster to need to break stuff on purpose yet
19:09 jclift_ s/far into/far enough into/
19:09 glusterbot jclift_: Error: I couldn't find a message matching that criteria in my history of 1000 messages.
19:09 18VABEU87 jclift_: thanks .... in my case ... it gave about 1,5 K split brains
19:10 jclift_ 18VABEU87: Yeah, I really have no idea with that part of stuff (yet)
19:10 18VABEU87 made me think alot if this is of any worth ... it gave me nothing but trouble
19:11 glusterbot New news from newglusterbugs: [Bug 981456] RFE: Please create an "initial offline bulk load" tool for data, for GlusterFS <http://goo.gl/d1AFm>
19:12 18VABEU87 So in a perfect world when everything is OK and perfect I suspect that the heal command will heal ... but on planet earth ... heal turns into hell
19:14 failshell jclift_: yeah that works if you rsync the data on the 2 bricks straight to the local disk
19:14 failshell now i need to time this
19:14 jclift_ failshell: Cool.  That would work for replica stuff then.  Stripe and distributed, not so much. ;)
19:14 failshell yeah
19:33 failshell im guessing gluster will trigger its self heal with that process, to create all the .gluster stuff
19:33 jclift_ phox: Do you have any experience with Linux kernel coding?
19:34 gluslog_ joined #gluster
19:34 paratai_ joined #gluster
19:35 * jclift_ is abstractly wondering if it would be possible to create a "wrapper" or something device in the Linux kernel, that intercepts all requests to a disk device and can send notifications
19:35 jclift_ eg like inotify, but for all disk operations
19:36 JordanHackworth_ joined #gluster
19:36 jclift_ Probably make more sense to have code added to the xfs driver itself though (but less flexible)
19:36 jclift_ Oh well
19:36 georgeh|workstat joined #gluster
19:36 matiz_ joined #gluster
19:36 VeggieMeat_ joined #gluster
19:36 the-me_ joined #gluster
19:37 chlunde joined #gluster
19:37 penglish2 joined #gluster
19:37 avati_ joined #gluster
19:37 phox jclift_: nope.  but I have experience reverse engineering the VM subsystem once upon a time....
19:37 GLHMarmo1 joined #gluster
19:38 jclift_ Heh, that sounds very time consuming ;)
19:38 phox jclift_: there's no reason you couldn't make a unionfs-like layer and basically bind/union/whatever-mount stuff
19:38 samppah_ joined #gluster
19:39 phox but basically this seems to be a slippery slope towards having a native gluster driver (instead of fuse) which -might- only work on the server node(s)
19:39 Ramereth1home joined #gluster
19:39 phox jclift_: yeah it took like 6 weeks or something.
19:39 phox jclift_: I was figuring out feasibility of implementing cache coherency equivalent semantics across NUMA nodes
19:39 jclift_ Thinking about this more (while waiting for a VM to do stuff), we could probably also create a Gluster translator that intercepts calls, and sends notifications of file writes to all of the other servers in a cluster.  Then rip out the stat() stuff in the main Gluster code.
19:40 phox jclift_: so for example shared libraries could be mapped in in memory local to any particular NUMA node, because they're not freaking writable anyways
19:40 jclift_ phox: Interesting
19:40 phox jclift_: yeah.  it would be worth doing, but I don't have time for it
19:40 phox the performance benefits were not totally insignificant.
19:40 jclift_ Yeah, to both
19:41 jclift_ both == the i memory mapping of local shared libraries, and creating a gluster translator to do proactive notification of changed files
19:41 phox what would make more sense is having a fake-mount layer that lets a filesystem be mounted WITHOUT actually exposing it through VFS
19:41 rcheleguini joined #gluster
19:41 glusterbot New news from newglusterbugs: [Bug 958781] KVM guest I/O errors with xfs backed gluster volumes <http://goo.gl/4Goa9>
19:42 jclift_ phox: That's confusing sounding ?
19:42 phox so instead of VFS "owning" the mounted FS, gluster (which would then need to have some in-kernel shiz) would, and then it could actually assume that nothing had changed behind its back
19:42 jclift_ Still confusing. :D
19:42 phox so there would be no need for it wondering if the underlying brick FS was undisturbed, so it could skip stat()ing stuff altogether
19:43 phox it changes gluster's assumptions in more or less the same way, only it becomes the ONLY interface to the mounted brick FS, so it doesn't need notify because it IS the gatekeeper
19:43 jclift_ How would that stop the need for stat() calls between nodes?
19:43 phox jclift_: this just means gluster would be the sole gatekeeper to the brick FS, which would allow gluster to implement what you just described internally without notify()
19:44 jclift_ phox: Maybe we have different understanding of the stat() call problem Gluster has.
19:44 jclift_ phox: From my perspective, I *think* the problem is that in a multi-node Gluster cluster...
19:44 phox jclift_: stat() sucks on a non-multi-node Gluster installation
19:45 phox FWIW
19:45 jclift_ phox: A client could have written to any of the Gluster nodes, without actually having updated all of them correctly
19:45 jclift_ Interesting btw, didn't know that either
19:45 phox i.e. listing a dir with 20k files in it can take a few minutes here, and our gluster doesn't have multiple nodes
19:46 phox rm: cannot remove `bc_bcsd_19502099/done/hadgem1_20c_A2​_daily/hadgem1_52.40625_-123.90625': Transport endpoint is not connected
19:46 phox seriously
19:46 jclift_ phox: So, the stat() calls between nodes (often triggered by ls -la) is for nodes to check "what's the latest timestamp and file info for this file XXX?" for every single file in a directory
19:46 phox gluster is being flakey as anything right now =/
19:46 phox yeah basically I think the solution there is for gluster to have its own cache of FS metadata without having to hit the brick's directories for this
19:46 jclift_ Yeah, I prior to the rdma dev code rewrite, I kept on getting that all the time with my test gear when doing dd on things to measure perf
19:47 phox FWIW this is local
19:47 jclift_ Ugh
19:47 phox i.e. vol mounted back on its own server
19:47 failshell any 'tricks' to use du ?
19:47 jclift_ Yeah.  It's not really a problem with the network layer
19:47 phox it IS hitting the IPoIB IP, but that shouldn't hit the hardware stack
19:47 failshell similar to not using ls -l?
19:47 jclift_ failshell: No idea unfortunately :(
19:47 * phox nods
19:47 phox jclift_: so what's the status of RDMA
19:48 phox I understand whathisname in RH land had some patches out bu it didn't look like RDMA was actually reenabled
19:48 jclift_ Well, there's been a fairly general rewrite of the underlying RDMA code
19:48 phox I would seriously love to be running RDMA right now
19:48 phox probably quadruple our throughput and drop latency a lot
19:48 failshell i would love anything but 1Gbps network ..
19:48 jclift_ The patch was merged ummm... maybe 2 or 3 weeks ago?
19:48 phox k
19:48 phox still not in a release version I take it
19:48 jclift_ No
19:48 phox this is in 3.3.x or 3.4.x?
19:48 phox or both?
19:49 jclift_ git dev
19:49 jclift_ So, "what will become 3.5"
19:49 phox ah
19:49 jclift_ But, they're backporting it to 3.4 too
19:49 phox any idea if there's been any indication of timeline?
19:49 phox k
19:49 phox WFM either way
19:49 phox brb
19:49 jclift_ Actually, not you mention it I know it's already been backported to 3.4 series branch, and was in the 3.4.0 beta3 rpms (and now 3.4.0 beta4)
19:49 jclift_ s/not/now/
19:49 jclift_ Sure, np
19:50 glusterbot What jclift_ meant to say was: Actually, now you mention it I know it's already been backported to 3.4 series branch, and was in the 3.4.0 beta3 rpms (and now 3.4.0 beta4)
19:50 jclift_ phox: So, I'm finding the RDMA code (in git master) was more stable than the previous RDMA code
19:51 jclift_ But, I'm having weird performance issues with it.  Good write speeds with a stripe 2 (1.5x individual brick speed), crap read speeds (1/3 individual brick speed)
19:51 Ramereth joined #gluster
19:52 jclift_ So, working on getting the Gluster Test Suite/Framework thing to operate on RDMA now.  It's been tcp only so far.
19:52 phox hm, k
19:53 htrmeira_ joined #gluster
19:53 phox ours is probably going to be single-brick for the foreseeable future
19:53 phox well, single-brick-per-vol
19:53 phox heh
19:53 jclift_ I have no idea at all why single brick stuff isn't decent speeds. :(
19:54 jclift_ Some time soon I want to measure the RDMA volumes with the various translators individually enabled.
19:54 jclift_ eg find out if some translator in the stack (client or server side) is doing anything dumb with copying
19:54 JoeJulian_ joined #gluster
19:55 jclift_ (thus my crap read speed)
19:55 georgeh|workstat joined #gluster
19:57 bstr_ joined #gluster
19:59 mjrosenb joined #gluster
20:00 eryc joined #gluster
20:00 eryc joined #gluster
20:08 18VABEU87 left #gluster
20:16 fleducquede joined #gluster
20:23 kedmison joined #gluster
20:23 recidive joined #gluster
20:24 * phox is not liking how long crap takes to delete
20:24 phox frankly I wish NFS Just Worked(tm) so I could use that instead :)
20:24 phox but NFS is the most incredibly broken bit of code I've seen in a long time... sadly
20:27 stickyboy joined #gluster
20:38 jclift_ Oh awesome.  Running the Gluster Test Suite on CentOS 6.3 (original GA version, unpatched), works fine.
20:39 jclift_ Yum updating the box to latest CentOS, and voila, the Gluster Test Suite results in a kernel panic.
20:40 jclift_ So much for "this should now be simple"
20:41 jclift_ Well, at least it's in the same spot place that my RHEL test box is panic-ing on.
20:55 badone joined #gluster
20:56 badone joined #gluster
21:04 duerF joined #gluster
21:07 badone_ joined #gluster
21:25 andreask joined #gluster
21:29 duerF joined #gluster
21:31 joelwallis joined #gluster
21:36 manik joined #gluster
21:43 ctria left #gluster
21:55 zaitcev joined #gluster
21:59 Koma joined #gluster
22:19 sprachgenerator joined #gluster
22:19 sprachgenerator left #gluster
22:23 sprachgenerator joined #gluster
22:23 sprachgenerator left #gluster
22:27 sprachgenerator joined #gluster
22:27 sprachgenerator left #gluster
22:32 Savaticus joined #gluster
22:33 sprachgenerator joined #gluster
22:33 sprachgenerator left #gluster
22:45 _pol joined #gluster
22:48 sprachgenerator joined #gluster
22:53 manik joined #gluster
23:04 Shdwdrgn joined #gluster
23:05 fidevo joined #gluster
23:12 sprachgenerator joined #gluster
23:12 glusterbot New news from newglusterbugs: [Bug 928656] nfs process crashed after rebalance during unlock of files. <http://goo.gl/fnZuR>
23:12 glusterbot New news from resolvedglusterbugs: [Bug 769804] Glusterd crashes when you add brick with replica 2 to the striped-replicate volume <http://goo.gl/EOlzN> || [Bug 782373] Gluster unable to fetch the volume info <http://goo.gl/jZ6pt> || [Bug 808452] nfs: mem leak found with valgrind <http://goo.gl/p1jb6> || [Bug 812844] cannot mount subdirectory on Solaris client <http://goo.gl/Au34B> || [Bug 831940] dd on nfs mount faile
23:14 sprachgenerator left #gluster
23:21 bstr_ joined #gluster
23:37 kedmison joined #gluster
23:59 badone joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary