Camelia, the Perl 6 bug

IRC log for #gluster, 2013-04-15

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:21 hagarth joined #gluster
00:40 bala joined #gluster
00:44 hagarth joined #gluster
00:55 yinyin joined #gluster
01:14 portante joined #gluster
01:15 hagarth joined #gluster
01:24 d3O joined #gluster
01:30 d3O joined #gluster
01:30 kevein joined #gluster
01:33 hagarth joined #gluster
01:58 d3O_ joined #gluster
02:17 kevein joined #gluster
02:54 saurabh joined #gluster
03:08 vshankar joined #gluster
03:13 kevein joined #gluster
03:26 bharata joined #gluster
04:07 sgowda joined #gluster
04:14 vpshastry joined #gluster
04:16 kshlm joined #gluster
04:16 kshlm joined #gluster
04:19 itisravi joined #gluster
04:22 bala joined #gluster
04:23 hagarth joined #gluster
04:27 glusterbot New news from newglusterbugs: [Bug 952029] Allow an auxiliary mount which lets users access files using only gfids <http://goo.gl/x5z1R>
04:27 atrius joined #gluster
04:28 premera joined #gluster
04:29 Unidentified4773 joined #gluster
04:32 sgowda joined #gluster
04:34 raghu joined #gluster
04:35 bulde joined #gluster
04:36 bharata joined #gluster
04:39 lala_ joined #gluster
04:39 harish joined #gluster
04:39 Shdwdrgn joined #gluster
04:41 saurabh joined #gluster
04:41 jbrooks joined #gluster
04:46 ramkrsna joined #gluster
04:46 ramkrsna joined #gluster
04:46 vpshastry joined #gluster
04:58 hchiramm_ joined #gluster
04:59 mohankumar joined #gluster
05:04 piotrektt_ joined #gluster
05:05 shylesh joined #gluster
05:09 bala joined #gluster
05:14 aravindavk joined #gluster
05:31 deepakcs joined #gluster
05:39 rotbeard joined #gluster
05:40 deepakcs joined #gluster
05:53 rgustafs joined #gluster
05:59 deepakcs joined #gluster
05:59 itisravi joined #gluster
06:00 guigui3 joined #gluster
06:01 itisravi joined #gluster
06:04 guigui3 joined #gluster
06:05 bulde1 joined #gluster
06:08 Nevan joined #gluster
06:11 deepakcs joined #gluster
06:19 msvbhat joined #gluster
06:30 ricky-ticky joined #gluster
06:30 sjoeboo_ joined #gluster
06:31 magnus^^p joined #gluster
06:46 Oneiroi joined #gluster
06:51 vpshastry1 joined #gluster
06:54 bulde joined #gluster
07:04 ctria joined #gluster
07:08 vimal joined #gluster
07:10 samppah @latest
07:10 glusterbot samppah: The latest version is available at http://goo.gl/zO0Fa . There is a .repo file for yum or see @ppa for ubuntu.
07:10 puebele joined #gluster
07:15 hybrid512 joined #gluster
07:16 ekuric joined #gluster
07:23 tjikkun_work joined #gluster
07:27 rastar joined #gluster
07:29 puebele1 joined #gluster
07:46 hchiramm_ joined #gluster
07:48 ngoswami joined #gluster
07:49 spider_fingers joined #gluster
07:50 itisravi joined #gluster
07:51 magnus^^^p joined #gluster
07:58 glusterbot New news from resolvedglusterbugs: [Bug 815330] nlm: server reboot gives lock to second application <http://goo.gl/8pVZm>
08:05 Norky joined #gluster
08:07 ekuric joined #gluster
08:09 ujjain joined #gluster
08:23 guigui3 joined #gluster
08:41 ekuric joined #gluster
08:42 vpshastry1 joined #gluster
08:42 ekuric joined #gluster
08:58 tryggvil joined #gluster
09:05 tryggvil joined #gluster
09:23 sgowda joined #gluster
09:24 jag3773 joined #gluster
09:32 brunoleon__ joined #gluster
09:35 edward1 joined #gluster
09:38 spider_fingers left #gluster
09:49 rastar joined #gluster
09:50 jclift joined #gluster
09:52 ingard__ hi guys
09:52 ingard__ i've got a lot of glustermounts on some of my boxes and each of those mounts consist of 40-
09:52 ingard__ ish bricks
09:52 ingard__ which means a lot of connections
09:53 ingard__ can i force the client to not connect out on for instance port 22 somehow?
09:59 bleon joined #gluster
10:02 sgowda joined #gluster
10:06 red-solar joined #gluster
10:06 magnus^^p joined #gluster
10:25 vpshastry1 joined #gluster
10:28 sgowda joined #gluster
10:38 Bonaparte joined #gluster
10:42 rastar joined #gluster
10:49 Norky ingard__, AFAIK the client should only use non-privileged ports ( >1024 )
10:49 Bonaparte Hello. I 'peer probed' a new host. On the new host, the status is shown as State: "Peer Rejected (Connected)"
10:49 Bonaparte This looks like a potential problem. How can I rectify this?
10:49 ingard__ Norky: https://bugzilla.redhat.com/show_bug.cgi?id=762989
10:49 glusterbot <http://goo.gl/kF50c> (at bugzilla.redhat.com)
10:49 glusterbot Bug 762989: low, high, ---, rabhat, MODIFIED , Possibility of GlusterFS port clashes with reserved ports
10:51 ingard__ doesnt seem to be anything I can do about it with the version I am on
10:53 duerF joined #gluster
10:55 H__ Is there a way to heal only a specific pair of bricks instead of the entire volume ?
10:58 glusterbot New news from newglusterbugs: [Bug 874498] execstack shows that the stack is executable for some of the libraries <http://goo.gl/NfsDK>
11:00 piotrektt joined #gluster
11:11 manik joined #gluster
11:14 hagarth joined #gluster
11:15 Bonaparte After detaching a host, the host is detached as expected. When glusterd is restarted, the detached host is listed again in peer status
11:15 Bonaparte How can I make gluster forget the host permanently?
11:15 bulde joined #gluster
11:15 H__ semiosis: do you have a 3.3 updated version of http://community.gluster.org/a/howto-targeted-s​elf-heal-repairing-less-than-the-whole-volume/ ? (as in: let fiknd skip the .glusterfs/ tree for instance)
11:15 glusterbot <http://goo.gl/E3b2r> (at community.gluster.org)
11:16 H__ Bonaparte: peer detach <HOSTNAME> [force] - detach peer specified by <HOSTNAME>
11:16 H__ does that not work after a reboot ?
11:16 Bonaparte H__, I used that command. After restarting glusterd, the host is back!
11:17 Bonaparte H__, you mean server reboot?
11:17 H__ well, either a glusterd or a server reboot.
11:17 xavih joined #gluster
11:17 Bonaparte H__, no, that does not work after glusterd restart
11:19 Bonaparte H__, here's the sequence -> http://paste2.org/LhfCZZUW s5local keeps coming back
11:19 glusterbot Title: Paste2.org - Viewing Paste LhfCZZUW (at paste2.org)
11:20 H__ Maybe this works as a hackery workaround: stopping all glusterd, removing the peers/file-in-question on all nodes and restart the set. But that's ill-advised. it really should be the gluster peer commands .
11:21 H__ Bonaparte: why the force ?
11:21 manik1 joined #gluster
11:21 H__ it is "State: Peer Rejected (Disconnected)" btw
11:21 Bonaparte H__, it kept on saying some host was disconnected. Thus I had to use force
11:22 Bonaparte H__, that is my original problem. When I added the new host, it was in Peer Rejected status. Now, I am trying to remove that host
11:22 H__ I cannot help you further; wait for someone else or a dev to show up
11:23 H__ I'll just continue hacking my way around replace-brick ;-)
11:23 jclift If if helps, I got around that "Peer Rejected" message on the weekend by shutting down all gluster processes on all nodes, then starting them up again after they were all off.
11:23 jclift s/If if/If it/
11:23 glusterbot What jclift meant to say was: If it helps, I got around that "Peer Rejected" message on the weekend by shutting down all gluster processes on all nodes, then starting them up again after they were all off.
11:23 jclift glusterbot--
11:23 * jclift is perfectly capable of correcting his own text
11:23 H__ LOL
11:26 jclift But I should point out I was doing things in a completely test/dev environment, so I do all kinds of weird things that can potentially break stuff for other people atm.
11:26 jclift The restarting glusterd processes though should be safe. :)
11:27 jclift Bonaparte: Is this production data?
11:27 Bonaparte jclift, yes
11:27 jclift k.  Was a thought. ;)
11:31 spider_fingers joined #gluster
11:34 manik joined #gluster
11:42 Bonaparte glusterd won't start at all
11:44 hchiramm_ joined #gluster
11:44 H__ Bonaparte: "tail -F /var/log/glusterfs/*.log" and stop/start gluster
11:46 Bonaparte H__, http://paste2.org/UBkwhIcj
11:46 glusterbot Title: Paste2.org - Viewing Paste UBkwhIcj (at paste2.org)
11:46 spider_fingers left #gluster
11:47 H__ Bonaparte: any idea what happened here ? -> E [glusterd-store.c:1320:glus​terd_store_handle_retrieve] 0-glusterd: Unable to retrieve store handle for /var/lib/glusterd/vols/p2s/info, error: No such file or directory
11:49 Bonaparte H__, no. File is not there at all
11:49 ollivera joined #gluster
11:50 H__ on all nodes ?
11:50 Bonaparte H__, no, only on this node. The other nodes have the file
11:54 Bonaparte H__, any suggestions on how to recover?
11:56 H__ not sure. this is production data for you so my mere-user advice is dangerous :-D
11:57 ndevos Bonaparte: maybe check 'gluster volume help' and see 'gluster volume sync'
11:59 Bonaparte ndevos, I think glusterd has to start first
12:00 Bonaparte ndevos, not sure how volume sync would help me get glusterd started
12:04 bulde1 joined #gluster
12:05 hagarth joined #gluster
12:08 vpshastry1 joined #gluster
12:10 samppah how often geo-replication crawls through the volume to check if sync is needed?
12:17 guigui1 joined #gluster
12:27 kkeithley AFAIK geo-rep doesn't crawl the volume. (Maybe the first time?) After that the marker framework is used to keep trace of what needs to be replicated and geo-rep uses that.
12:29 samppah kkeithley: okay, should that be near realtime or how long the delay is before it syncs data again?
12:32 ndevos Bonaparte: oh, right - I this I'd just try copying the info file over :)
12:32 spider_fingers joined #gluster
12:33 kkeithley Not sure. I'd like to think it'd be close to immediately after a file is closed after writing to it. bulde1, hagarth, any thoughts?
12:35 kkeithley Bonaparte, devos: I was going to look at a multi-brick config before I suggested that. The contents are pretty straight forward (obvious). If you're feeling bold you could try copying, as ndevos suggests, from another node.
12:36 Bonaparte Okay, ndevosm kkeithley I will try that after backing up data again :)
12:36 bet_ joined #gluster
12:37 hagarth kkeithley, samppah: the marker framework does involve crawling. The marks or hints are placed only on directories. Marker only helps in avoiding a full crawl of the volume.
12:37 hagarth samppah: the time difference between crawls is 600 seconds, if I remember correctly.
12:38 samppah hmm
12:38 * ndevos is pretty sure its 10 minutes too
12:41 aliguori joined #gluster
12:43 samppah this is the last line in log file: [2013-04-15 14:54:47.67586] I [master:669:crawl] _GMaster: completed 1 crawls, 0 turns
12:43 samppah timezone is UTC+2 :)
12:45 hagarth samppah: is this with alpha2?
12:46 samppah hagarth: oh right, git branch release-3.4
12:48 hagarth samppah: and nothing gets synced?
12:49 samppah hagarth: initial sync was fine but it looks like it's not syncing changes
12:50 samppah hmm
12:50 hagarth samppah: are these VM images?
12:50 samppah hagarth:
12:50 dustint joined #gluster
12:50 samppah yes
12:50 samppah and just after my last line it started syncing again
12:50 hagarth samppah: how are you determining that syncing does not happen?
12:50 hagarth ah ok :)
12:52 spider_fingers joined #gluster
12:53 samppah hagarth: it seems to sync one file at time?
12:54 hagarth samppah: files are identified during a crawl and there are workers which do the actual sync
12:54 hagarth you can increase the worker count for more parallelism
12:55 kkeithley Bonaparte: yes, I think it should be safe to copy the info file from another node. I just created a two brick DHT volume (one brick per node) and the /var/lib/glusterd/vols/$volname/info files were the same on both nodes.
12:55 hagarth but however there is a single thread that does the identification or crawling .. so you might see parallelism happening in a burst after a lull period.
12:55 samppah hagarth: okay, how can i increase workers? haven't seen any documentation about that
12:56 Bonaparte kkeithley, I copied the volume info and bunch of other files. glusterd started. Now, gluster doesn't the other peers
12:56 Bonaparte kkeithley, it is probably missing other metadata too
12:57 H__ which log file should show that a stat on a file repaired a missing replicate copy on a brick ?
12:58 dustint joined #gluster
12:58 Chiku|dc hi Why I got this messages ?
12:58 Chiku|dc [2013-04-15 14:57:32.227493] W [client.c:2069:client_rpc_notify] 0-VOL_REPLICATED-client-0: Cancelling the grace timer
12:58 Chiku|dc [2013-04-15 14:57:32.228118] I [client.c:127:client_register_grace_timer] 0-VOL_REPLICATED-client-0: Registering a grace timer
12:58 Chiku|dc [2013-04-15 14:57:32.228143] I [client.c:2090:client_rpc_notify] 0-VOL_REPLICATED-client-0: disconnected
12:58 Chiku|dc I umount the vol on the clients
12:58 flrichar joined #gluster
13:00 hagarth samppah: let me check
13:01 sjoeboo_ joined #gluster
13:02 hagarth samppah: you can use sync-jobs as described here - http://gluster.org/community/documentatio​n/index.php/Gluster_3.2:_gluster_Command
13:02 glusterbot <http://goo.gl/Flf2b> (at gluster.org)
13:03 samppah hagarth: thanks :)
13:04 robo joined #gluster
13:06 Helfrez joined #gluster
13:08 Helfrez Hey samppah you around?
13:08 samppah Helfrez: hey, i'm here but i need to leave soonish
13:09 Helfrez quick question, +3yrs later I never did use the atom 330 servers lol
13:09 samppah :D
13:09 jdarcy joined #gluster
13:09 samppah oh, you are that guy...
13:09 Helfrez we had quite a few discussions about that if you remember
13:09 dustint joined #gluster
13:09 Helfrez yes! gluster has been humming away that long, went with quad xeons instead
13:10 Helfrez qorks marvelous hosting kvm with native client
13:10 samppah cool, so you got good hardware to play with :)
13:10 Helfrez yeah, everything has been fully deployed for about 3yrs now
13:11 Helfrez now fast forward, wanted to do some testing with vmware stuff
13:11 Helfrez only options is nfs of course
13:12 Helfrez because of the lack of the native client, I am seeing some pretty heavy cpu spikes during writes
13:12 Helfrez is that normal/healthy? I would assume its the servers forcing the brick update on the other peer
13:13 pdurbin left #gluster
13:13 Helfrez and its easy to replicate, I just don't like seeing 389% cpu spikes
13:14 samppah Helfrez: so you are seeing heavy cpu usage when there is io on vm images that are hosted over nfs?
13:14 Helfrez yes, and I would expect some usage spikes, just like gluster native client, but its basically 1 glusterfsd maxing out 4 cores
13:16 Helfrez The same load from the native client generates maybe 20% usage
13:17 Helfrez I need to run the test from a non-vmware client as a test as well, now that I think about it
13:17 samppah have you looked at how that load looks like at client side?
13:17 Helfrez client side cpu on the hosts is ok
13:18 Helfrez I need to run that test real quick as well, non-vmware nfs, to see if it's something vmware is doing versus the transport itself maybe
13:18 y4m4 joined #gluster
13:18 Helfrez but gluster native, proxmox atm, = 20-30%..vmware nfs to single server= 389%
13:21 samppah i have seen some high load on glusters client side when vm is performing lots of random io
13:22 samppah just wondering if that could be the case since i think that nfs server is just client with nfs translator
13:23 samppah please feel free to correct is my assumption is wrong :)
13:23 Helfrez yeah, I am trying to think it out and figure out if its broken,misconfigured or just WAI
13:24 Helfrez but I am fairly confident I could mount straight nfs, and mirror something, saturate the interfaces without saturating the cpu
13:24 Helfrez my concern is slowdown under the maxed cpu
13:28 hagarth joined #gluster
13:29 Helfrez AHAH, now there is an interesting result
13:30 Helfrez if I go from a non-vmware host native linux system, 2.6 kernel, basic nfsmount with only the vers=3 option
13:30 Helfrez run the same test
13:31 Helfrez it taverages about 80%
13:31 samppah huh
13:32 samppah what test you are running btw?
13:32 Helfrez im just doing a dd with fdasync to saturate line
13:32 Helfrez so worst case scenario
13:33 mohankumar joined #gluster
13:34 Helfrez so from within a vmware vm, running from a nfs mount, it uses about 90% split between glusterfs and glusterfsd
13:42 dustint joined #gluster
13:45 Nagilum_ I have 3.4 alpha2 installed and wonder, should I worry about these mesages: E [socket.c:2767:socket_connect] 0-management: connection attempt failed (Connection refused) ?
13:46 Nagilum_ is there some new service that I need to install start? (compared to 3.3)
13:46 jiffe98 joined #gluster
13:49 chirino joined #gluster
13:54 tryggvil joined #gluster
13:57 lalatenduM joined #gluster
13:59 rgustafs joined #gluster
14:00 robos joined #gluster
14:02 semiosis H__: targeted self heal should not be needed on 3.3+ since there is a self heal daemon
14:05 itisravi joined #gluster
14:05 H__ well
14:05 H__ i beg to differ ;-)
14:05 vpshastry joined #gluster
14:06 tryggvil joined #gluster
14:06 H__ because replace-brick kills my volume i used an rsync (and later a bsdtar, much faster) to copy a brick 'underwater' with xattrs and hardlinks included. Then forced a replace-brick, and now I need to 'fix' the missing stuff during the underwater copy.
14:07 manik joined #gluster
14:08 H__ If I do a full volume heal I have to wait weeks for it to complete. That's why I want to do a targeted self heal.
14:08 jskinner_ joined #gluster
14:08 H__ semiosis: maybe you have better ideas / alternatives ?
14:08 semiosis i have no idea what an underwater copy is
14:09 H__ brick -> brick
14:09 H__ 'below' glusters knowledge layer
14:09 Supermathie H__: i.e. behind the scenes?
14:09 H__ yes
14:10 hagarth joined #gluster
14:11 H__ semiosis: I use this for 3.3 : find /gluster/a/ -path '*.glusterfs/*' -prune -o -printf "/mnt/vol01/%P\0" | xargs -0 stat --format='%i %n'
14:12 semiosis bbiab
14:17 Supermathie ndevos: You around?
14:26 guigui3 joined #gluster
14:27 andreask joined #gluster
14:27 dustint joined #gluster
14:31 rb2k joined #gluster
14:32 rb2k hey! is louis zuckermann around by any chance?
14:32 rb2k or anybody else that could give me a hint a packaging deb packages for gluster
14:32 ndevos Supermathie: yeah, but a little busy atm - have you seen the latest patch I have attached to the bug?
14:33 * ndevos tested Linux NFS a little with that one
14:33 Supermathie Maybe I didn't get the latest one from the patch...
14:33 Supermathie checking
14:34 ndevos Supermathie: the one in the email was not the latest (I think)
14:35 Supermathie ferent...
14:36 Supermathie mmm... the one in https://bugzilla.redhat.co​m/attachment.cgi?id=735301 is slightly different...
14:36 glusterbot <http://goo.gl/p5o8Z> (at bugzilla.redhat.com)
14:36 semiosis rb2k: pong
14:36 rb2k ahhh!
14:36 rb2k hey :)
14:37 rb2k I'm a noob when it comes to packaging deb files :( especially with the correct init.d stuff
14:37 rb2k I have the branch I want cleanly compiling
14:37 Nagilum_ rb2k: http://torbjorn-dev.trollweb.​net/gluster-3.4.0alpha2-debs/
14:37 glusterbot <http://goo.gl/rrqNd> (at torbjorn-dev.trollweb.net)
14:37 rb2k Nagilum_: I need 3.3.2qa1 :)
14:38 rb2k but thanks
14:38 semiosis rb2k: what distro/version exactly?
14:39 Nagilum_ rb2k: then it should not be that hard to adjust http://download.gluster.org/pub/gl​uster/glusterfs/3.3/LATEST/Debian/
14:39 glusterbot <http://goo.gl/tqbmV> (at download.gluster.org)
14:39 rb2k Nagilum_: all of them. 32/64 hardy/lucid/precise
14:40 rb2k (we run a few machines that need a fix in qa1)
14:40 semiosis hardy?!?!?!
14:40 Nagilum_ heron!
14:40 rb2k well, that one is dying at the end of the month :)
14:40 spider_fingers left #gluster
14:42 rb2k semiosis: is there any doc on how you compile those debs on launchpad?
14:42 Chiku|dc when do you think 3.4 will be release ?
14:43 semiosis rb2k: here's the easy way. your build machine will need pbuilder and devscripts (packages) installed
14:43 Chiku|dc now it's alpha, then beta and rc too ?
14:43 rb2k semiosis: listening
14:44 rb2k Chiku|dc: that's how it usually works
14:44 Chiku|dc so at least for few months ?
14:45 semiosis rb2k: then get the source tarball, which you have already, rename it glusterfs_3.3.2.orig.tar.gz, and make a clean extraction of it in the same directory
14:47 semiosis now go get the source package file from my ppa for the distro/version you're building... you can find them here: https://launchpad.net/~semiosis/+arc​hive/ubuntu-glusterfs-3.3/+packages
14:47 glusterbot <http://goo.gl/3YP68> (at launchpad.net)
14:47 nueces joined #gluster
14:47 semiosis the file you want is the .debian.tar.gz file, which just has the debian/ folder in it
14:47 semiosis extract that into the clean source tree you just extracted previously
14:47 piotrektt joined #gluster
14:48 semiosis now in that source tree, you need to edit the debian/changelog, adding a new entry at the top for the glusterfs version you're building and distro release you're targeting
14:49 neofob left #gluster
14:49 rb2k so far so good
14:49 rb2k :)
14:50 semiosis then in the root of that source tree, run debuild (I do debuild -S -sa) this will prepare the source package for building by creating a .dsc file in the parent directory, next to the orig.tar.gz
14:50 semiosis you can use pbuilder to build the package
14:50 semiosis pbuilder --build glusterfs_3.3.2-precise1.dsc or whatever
14:51 rb2k sweet
14:51 rb2k ok, let me give that a try
14:51 semiosis thats the strategy, you'll probably need to read up on the tools & package policies to fill in some details
14:51 semiosis good luck
14:53 Supermathie ndevos: Well, Linux NFS is still OK, but gluster NFS server still can't decode the FSINFO params with DNFS.
14:54 rb2k semiosis: thanks!
14:56 H__ what does this mean exactly ? -> I [dht-layout.c:593:dht_layout_normalize] 0-vol01-dht: found anomalies in /foo/bar. holes=1 overlaps=0
14:56 rotbeard joined #gluster
14:56 jbrooks joined #gluster
15:08 ndevos Supermathie: hmm, can you capture a tcpdump again?
15:10 Supermathie I did, it looks the same and I still get: [2013-04-15 10:50:14.961149] E [nfs3.c:4741:nfs3svc_fsinfo] 0-nfs-nfsv3: Error decoding arguments
15:11 neofob joined #gluster
15:12 Nagilum_ with 3.4a2 should I worry about these messages: E [socket.c:2767:socket_connect] 0-management: connection attempt failed (Connection refused)
15:12 Nagilum_ ?
15:17 ndevos Supermathie: can you compare the fhandle that is returned in the MNT Reply and the one that is used in the FSINFO Call?
15:18 daMaestro joined #gluster
15:22 Supermathie ndevos: They are the same.
15:22 Supermathie (as they were before) :)
15:22 Supermathie <3 the wireshark authors for having a crc32 of the FH in the packet info - makes it easy to compare.
15:24 rb2k semiosis: https://gist.github.com/rb2k/2d646592081f9209930f any idea about this one?
15:24 glusterbot <http://goo.gl/p5MTW> (at gist.github.com)
15:24 rb2k (when running a plain "debuild"
15:26 semiosis well that file is named in debian/glusterfs-common.install -- you could try removing that line from that file
15:27 bugs_ joined #gluster
15:30 ndevos Supermathie: hmm... I'm not sure whats happening then :-/ the fhandle in the FSINFO should now get decoded even if it is missing the additional 00 bytes (for xdr roundup)
15:34 ndevos Supermathie: sprawling some gf_log() in the nfs xdr file is probably the easiest to figure out which function returns the error - gdb would be more time-consuming
15:34 Helfrez so I have narrowed it down a bit, but still no guesses what the problem is...
15:34 doc|holliday joined #gluster
15:37 Helfrez something vmware is doing is generating significantly more load than even a standard nfs mount from a OOB linux server to nfs
15:40 ricky-ticky joined #gluster
15:40 rb2k semiosis: removing a few .a files from that file seemed to have helped
15:40 rb2k if they are needed is another question
15:49 andreask joined #gluster
15:53 ash13 joined #gluster
16:00 doc|holliday in a distributed only cluster, if a brick being written to unexpectedly disappears (server died, network loss, etc), what will happen if I continue to write to the file?
16:00 doc|holliday will I just be redirected to another brick?
16:00 jclift_ joined #gluster
16:02 Supermathie ndevos:  0-glusterfs (nfs): xdr_opaque failed, data_len: 34
16:07 ndevos Supermathie: hmm, thats interesting! the int is read correctly, but xdr_opaque may still expect the data to have the additional 2 roundup bytes?
16:08 Dave2 joined #gluster
16:08 jskinner_ how do you online a brick?
16:09 Supermathie ndevos: could be, yeah.
16:09 jskinner_ we pulled a drive to do some testing, put the drive back in, and the brick still says offline
16:10 Jerderwerk joined #gluster
16:10 ndevos Supermathie: http://tools.ietf.org/html/rfc4506#section-3 seems to suggest that... I guess Oracle DNFS isn't particulary standard conform in that respect :-/
16:10 glusterbot Title: RFC 4506 - XDR: External Data Representation Standard (at tools.ietf.org)
16:11 ndevos jskinner_: 'gluster volume start MYVOL force' should start any missing processes - there is probably a glusterfd process missing for that brick
16:11 jskinner_ ok ill give that a shot
16:13 lh joined #gluster
16:14 jskinner_ still offline with no pid listed
16:14 jskinner_ the volume itself says online
16:14 jskinner_ just missing 2 bricks
16:17 doc|holliday joined #gluster
16:28 Supermathie ndevos: https://gist.github.com/Supermat​hie/5389349#file-gistfile1-c-L19
16:28 glusterbot <http://goo.gl/OoKrJ> (at gist.github.com)
16:34 sjoeboo_ joined #gluster
16:35 _Bryan_ having not been around for awhile..I deserve to be flogged...but I have an error that is new to me and causing quite a bit of grief....can anyone explain this?
16:35 _Bryan_ [2013-04-15 09:35:25.110471] W [dict.c:418:dict_unref] (-->/opt/glusterfs/3.2.5/lib64/libgf​rpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f134102f365] (-->/opt/glusterfs/3.2.5/lib64/glusterfs/3.2.5/xla​tor/protocol/client.so(client3_1_fstat_cbk+0x33b) [0x7f133d7b6a9b] (-->/opt/glusterfs/3.2.5/lib64/glus​terfs/3.2.5/xlator/cluster/replicat​e.so(afr_sh_data_fstat_cbk+0x1eb) [0x7f133d55be5b]))) 0-dict: dict is NULL
16:36 _Bryan_ I should also mention...Gluster 3.2.5
16:36 Mo_ joined #gluster
16:36 Supermathie ndevos: So Linux's knfs (at least the particular one I checked) issues NFS filehandles length==20
16:37 Supermathie I bet NetApp does too. Which explains why Oracle works on those.
16:37 Supermathie I bet Oracle never tested against an NFS server that issues filehandles of length != 0mod4
16:38 * Supermathie patches glusterfs to issue filehandles of length 36 instead of 34...
16:38 Supermathie heh
16:38 hagarth joined #gluster
16:39 jskinner_ anyone familiar with debian?
16:39 jskinner_ I am having an issue mounting gluster nfs share with a debian squeeze client
16:40 jskinner_ I need to be able to mount it without specifying -o nfsvers=3
16:40 jskinner_ and also without fstab
16:40 jskinner_ mount -t nfs ipaddress:/glustervolume /mnt/share
16:40 jskinner_ needs to be able to work for me lol
16:40 Supermathie jskinner_: specify vers=3 option in mount command, otherwise it'll try nfs4 and go directly to the nfs port and not contact portmapper
16:40 lh joined #gluster
16:40 lh joined #gluster
16:40 jskinner_ :(
16:40 Supermathie (started typing that before seeing all that :)
16:40 jskinner_ lol its ok
16:40 Supermathie Try changing gluster's nfs daemon port to 2049
16:41 Supermathie So it's in the right place...
16:41 Supermathie "right"
16:41 jskinner_ I know with RHEL I can just edit the nfsmount.conf file
16:41 jskinner_ no such file on the debian box that I can find
16:41 portante joined #gluster
16:41 jskinner_ is that just nfs.port
16:42 Supermathie proooooooooobably
16:42 jskinner_ nfs.port  (38465 to 38467)
16:42 jskinner_ Found that here
16:42 Supermathie jskinner_: on my ubuntu, nfsmount.conf is supported
16:42 jskinner_ http://gluster.org/community/documentation/ind​ex.php/Gluster_3.2:_Setting_Volume_Options#nfs.rpc-auth-null
16:42 glusterbot <http://goo.gl/OJoQR> (at gluster.org)
16:42 Supermathie jskinner_: it probably just doesn't exist yet.
16:43 jskinner_ does nfs need to be compiled to use that file? or could I just create it and cross my fingers lol?
16:44 jskinner_ the only thing I have is: /etc/defaults/nfs-common
16:45 Supermathie jskinner_: 'man 5 mount' and see if nfsmount.conf is mentioned
16:45 jskinner_ bah, box doesn't have man
16:45 jskinner_ lol
16:46 manik joined #gluster
16:46 Supermathie Just put the file there and see what happens :D
16:47 jclift_ joined #gluster
16:49 hagarth joined #gluster
16:57 bulde joined #gluster
16:59 Jerderwerk hi
16:59 glusterbot Jerderwerk: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
17:00 crazy_cat_lady joined #gluster
17:00 crazy_cat_lady hello world!
17:00 Jerderwerk i'm trying to get a two-brick setup with glusterfs working
17:00 Jerderwerk the bricks will be on two physical machinces, with "replica" set to "2"
17:01 Jerderwerk at the moment i'm testing with  virtual machines
17:01 Jerderwerk for the most part gluster has been great, but one test reproducabily leads to data loss for me
17:02 crazy_cat_lady do glusterfs processes (on client side, running fuse translator) use the slab memory allocator as well? is it possible to somehow get statistics about the pools, similar to what volume status mem does for a brick?
17:02 tjstansell joined #gluster
17:02 Jerderwerk i write 300 10-mb files into the cluster mount on a client and shut one of the bricks down mid-procss
17:03 Jerderwerk i let the process complete with only one brick
17:03 bstansell joined #gluster
17:03 Jerderwerk when i bring the brick back online, it'll show the files it missed in the exported directory, but they're file size 0
17:03 tjstansell for a 2-node replica volume, what's the preferred way to verify that all files have been replicated between the bricks?
17:04 Jerderwerk the file that was being written when i shut the brick down will be incomplete
17:04 tjstansell i was hoping there'd be some sort of percentage statistic, like X inodes out of Y have been replicated.
17:04 tjstansell but i can't find anything like that.
17:04 Jerderwerk running a heal automatically will not fix the partial files but does copy the other files
17:04 Jerderwerk the partial file is recognized as split-brain
17:05 Jerderwerk doing the same process but shutting down the other node however lead to some files being zero-ed after a heal
17:05 Jerderwerk this is what disturbs me most
17:06 Jerderwerk the file will be zero-ed on both bricks, so the data is lost completely
17:06 Jerderwerk has anyone experienced something simliar?
17:08 y4m4 joined #gluster
17:08 hagarth joined #gluster
17:11 y4m4 joined #gluster
17:19 crazy_cat_lady left #gluster
17:24 _pol joined #gluster
17:25 ThatGraemeGuy joined #gluster
17:28 jskinner_ joined #gluster
17:36 theron joined #gluster
17:37 semiosis doc|holliday: operations will fail, transport endpoint not connected. you should try it and see for yourself. if reliability is important, use replication.
17:42 hagarth joined #gluster
17:57 lh joined #gluster
17:57 lh joined #gluster
18:00 jskinner_ joined #gluster
18:03 Supermathie ndevos: Unfortunately, fixing xdr_nfs_fh3 causes things elsewhere to fail. Catastrophically. The NFS daemon coredumps :(
18:13 lalatenduM joined #gluster
18:21 stefano joined #gluster
18:25 jbrooks joined #gluster
18:32 doc|holliday semiosis: thank you sir. I am I/P setting up test environment, but wanted to check a few things concurrent.
18:37 vpshastry joined #gluster
18:42 hagarth joined #gluster
19:01 hagarth joined #gluster
19:20 hagarth joined #gluster
19:26 m0zes my boss wants me to upgrade the kernels on our fileservers with minimal downtime, if I can SIGSTOP all processes writing to a distributed volume, can I reboot the fileservers and then SIGCONT the processes without loss of the locks?
19:30 kkeithley after you reboot, what processes do you imagine will be there to SIGCONT?
19:35 m0zes sorry these would be processes on the clients.
19:36 JoeJulian m0zes: upgrade from/to?
19:36 m0zes this would be an upgrade from 3.5.7 to 3.7.10
19:37 * JoeJulian double checks to see that he's still in #gluster
19:37 JoeJulian Ah, nevermind..
19:37 JoeJulian kernels...
19:38 JoeJulian If all the servers are gone at the same time, I don't think the locks will still exist...
19:38 JoeJulian but I could easily be wrong on that.
19:38 Supermathie WOOHOO IT WORKS! :)
19:38 m0zes I was planning on rebooting one at a time, but being that this is a pure distributed volume I wasn't sure it would matter.
19:40 JoeJulian m0zes: I think there's a strong chance that would work. Should be easy to run a test though.
19:41 m0zes I was thinking the same thing, I was just hoping someone had thought of or tried this before
19:46 Supermathie Aw man... apparently glusterfsd isn't able to drive the SSD hard enough before maxing out the CPU ;(
19:47 semiosis Supermathie: what application are you using to generate load?
19:48 semiosis if dd use bs=1M
19:49 Supermathie semiosis: Oracle :)
19:49 semiosis eh
19:50 Supermathie Well, jmeter hitting Oracle hitting gluster
19:51 genewitch joined #gluster
19:51 genewitch i think i messed up the peer probe
19:53 _pol joined #gluster
19:53 _pol_ joined #gluster
19:55 jdarcy joined #gluster
20:00 glusterbot New news from resolvedglusterbugs: [Bug 865914] glusterfs client mount does not provide root_squash/no_root_squash export options <http://goo.gl/hJiLH>
20:06 genewitch how do i force detach a peer if gluster thinks the peer hostname is localhost?
20:06 genewitch i forgot to run a sed command prior to peering so the hostnames were wrong
20:06 theCzar joined #gluster
20:08 theCzar OK, so when I try to install gluster from source on an ubuntu machine I get a strange error when I try to start the service: /usr/local/sbin/glusterd: error while loading shared libraries: libglusterfs.so.0: cannot open shared object file: No such file or directory
20:08 genewitch theCzar: did you sudo make install
20:08 genewitch cuz that's the error i'd expect if you did not
20:09 Supermathie theCzar: Is the right library directory in ld.so.conf?
20:09 theCzar I'm configuring with: ./configure --disable-ibverbs --disable-georeplication --enable-fusermount --libdir=/usr/local/lib
20:10 theCzar I'll check
20:10 theCzar yup it's directory is found in ld.so.conf
20:11 genewitch and is that file in that directory?
20:11 theCzar yes
20:11 Nagilum_ ldd -v /usr/local/sbin/glusterd
20:11 semiosis theCzar: why are you building from source?  is there something you need that existing packages don't address?
20:11 genewitch theCzar: ldconfig?
20:13 jbrooks joined #gluster
20:13 theCzar semiosis: yes, production system and we're trying to turn off certain things that we don't need.
20:13 genewitch what files do i rm if glusterd refuses to detach a peer because it thinks it is localhost
20:13 dustint_ joined #gluster
20:13 semiosis theCzar: oh ok
20:14 theCzar genewitch: AHA! ldconfig fixed it
20:20 genewitch that means the install script is busted
20:20 genewitch or just doesn't have that in there
20:20 genewitch :-D
20:22 genewitch do i need to be more descriptive to get help :-P glu2.example.com# gluster peer detach glu1.example.com force "glu1.example.com is localhost"
20:25 jskinner_ joined #gluster
20:28 genewitch nevermind i'll just redeploy the server
20:32 pib2001 joined #gluster
20:36 dustint_ joined #gluster
20:38 jskinner_ joined #gluster
20:40 genewitch probe on host glu4.example.com port 0 already in peer list
20:40 genewitch what am i doing wrong?
20:40 genewitch i can't peer anything
20:43 Nagilum_ remove the peer before trying to add it again?
20:45 genewitch with peer detach?
20:45 genewitch Nagilum_: it reports that glu3 and glu4 are not part of cluster
20:46 genewitch there's some directory i have to delete i think
20:46 brunoleon_ joined #gluster
20:47 Supermathie [2013-04-15 16:45:54.796230] E [glusterd-utils.c:1528:glus​terd_volume_compute_cksum] 0-management: Could not generate temp file, reason: No space left on device for volume: gv0
20:48 Supermathie Tons of space left :/
20:48 Supermathie [2013-04-15 16:48:11.615660] W [client3_1-fops.c:1545:client3_1_finodelk_cbk] 0-gv0-client-9: remote operation failed: No such file or directory
20:48 Supermathie Any way to track down which is client-9?
20:48 semiosis @subvolume
20:48 glusterbot semiosis: I do not know about 'subvolume', but I do know about these similar topics: 'read-subvolume'
20:49 semiosis Supermathie: http://community.gluster.org/q/what-is-a-subvo​lume-what-does-subvolume-myvol-client-1-mean/
20:49 glusterbot <http://goo.gl/O2aLY> (at community.gluster.org)
20:50 Nagilum_ I have 3.4 alpha2 installed and wonder should I worry about these mesages: E [socket.c:2767:socket_connect] 0-management: connection attempt failed (Connection refused) ?
20:51 genewitch
20:54 Jerderwerk i'm using gluster 3.3.1, after a heal with two bricks in replica mode some files in the internal directories contain no data, i.e. 0-byte size
20:54 Jerderwerk it seems very similiar to this: http://www.gluster.org/pipermail/g​luster-users/2011-June/030955.html
20:54 glusterbot <http://goo.gl/N2OTQ> (at www.gluster.org)
20:55 Jerderwerk does anyone have any experience with an issue similiar to this?
21:00 sjoeboo_ joined #gluster
21:05 Supermathie semiosis: Thanks, that helped
21:05 semiosis yw
21:17 duerF joined #gluster
21:17 Supermathie ?!?! [2013-04-15 17:17:46.270752] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-10: remote operation failed: Permission denied
21:18 bstansell left #gluster
21:29 _pol_ joined #gluster
21:30 _pol joined #gluster
21:32 Supermathie Hrm..... so one of my glusterfsd processes is totally unable to keep up with everything being thrown at it.
21:33 Supermathie PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21:33 Supermathie 24048 root      20   0 3481m 2.1g 2236 S 100.6  1.6  23:36.27 glusterfsd
21:33 Supermathie 24234 root      20   0 23.2g  18g 2660 S  7.6 14.5   3:48.18 glusterfs
21:41 Supermathie If I'm interpreting that right
21:41 brunoleon joined #gluster
21:47 squizzi joined #gluster
21:47 doc|holliday anyone tried running gluster over openvpn?
21:49 JoeJulian Supermathie: Yeah, I don't see cpu usage anywhere near that no matter what I do.
21:50 JoeJulian doc|holliday: Only troubles would be latency and reduced mtu. Those are just performance troubles though. It should work just fine.
21:50 doc|holliday JoeJulian: ok thank you
21:52 Jerderwerk after some testing it gets even worse. with a pool of four bricks and "replica" set to "4", dropping two nodes during a write process and bringing them back up lost most of the data
21:52 Supermathie Do diagnostics.latency-measurement
21:52 Supermathie Do diagnostics.latency-measurement and diagnostics.count-fop-hits introduce a significant load?
21:54 runlevel1 joined #gluster
21:54 JoeJulian Supermathie: I've never set those, so I don't know.
21:56 JoeJulian Jerderwerk: Were the files in sync before you brought down two bricks? Did the self-heal complete before you tried it again? Were the additional bricks the same two or different? You'll need to document your tests completely to be able to identify possible diagnoses.
21:57 runlevel1 joined #gluster
21:58 JoeJulian Jerderwerk: Start from scratch. Try to duplicate your test findings reliably, then file a bug if you're successful.
21:58 glusterbot http://goo.gl/UUuCq
21:59 Jerderwerk i wrote 300 10-mb files to a mounted volume. the volume had nodes A, B, C, D - i dropped B and a couple of files later dropped C. brought B back and ran "gluster volume heal volumename full", which reported a successful heal
22:00 Jerderwerk but in /export/brick1 there were some zero-d files. after bringing up node C and running the same heal command it deleted all files from file 81 onward, with the last couple of files being zero-d
22:00 Jerderwerk i've done this a couple of times now
22:00 Jerderwerk seems to be reproducible
22:01 Jerderwerk as i said earlier, i have the same issue with two machines
22:01 JoeJulian define "which reported a successful heal" because I think that may be a misinterpretation of the response message.
22:01 Jerderwerk well, heal processing initiated successfully i believe
22:03 runlevel1 joined #gluster
22:03 JoeJulian That's one thing that's bugging me. On a "heal...full" there's no way to know if it's completed.
22:04 tjstansell joined #gluster
22:04 Jerderwerk i've watched the /export/brick1 directory. the missing files showed up with zero byte size, and the data was copied
22:04 Jerderwerk until there were the 300 files size 10mb, except for the couple of files that stayed at zero size
22:04 JoeJulian But the self-heal still may not have been complete.
22:05 Jerderwerk is there any way to know if it completes?
22:05 JoeJulian "heal ... info" /should/ list any files with pending heals.
22:05 JoeJulian Also you could check the ,,(extended attributes) on the good bricks.
22:05 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://goo.gl/Bf9Er
22:06 stefano joined #gluster
22:06 Jerderwerk on my earlier tests "gluster volume heal volumename info", "info heal-failed" and "info split-brain" didn't show up anything
22:07 Jerderwerk i didn't look at them with the recent test. let me repeat the test and check the output of those commands, as well as the extended attributes
22:08 Jerderwerk although this wouldn't help me as the files did show up as zero size in the client
22:08 JoeJulian I'm, personally, not 100% sure of the results of those commands with regard to a "heal...full". Just a plain "heal" I'm more comfortable with.
22:08 Jerderwerk ok, will look at that
22:09 JoeJulian I misread that 0 size in the client. I was thinking you said 0 size on the brick.
22:09 Jerderwerk both
22:10 Jerderwerk the zero size propagated to all nodes and showed up in the client, as well as the loss of all files after file 81
22:10 JoeJulian If that's on the client then that does certainly sound like a valid bug. Not entirely surprising either as I don't think they test with more than replica 2. jdarcy did find a self-heal bug in replicas > 2. Try against 3.3.2qa1 or 3.4.0qa releases too.
22:12 Jerderwerk i ran into similiar issues with two replicas
22:12 Jerderwerk i would do a similiar test and alternately shut down the bricks
22:13 Jerderwerk if i had node A and B, i'd shut down A during a write process, let the process complete, bring it back up and heal, then start the write process and take down node B
22:13 Jerderwerk after bringing node B up i'd have some files with size zero, which also showed up in the client
22:14 Jerderwerk additionally, the file that was written to while i took down the node would not be healed and stay corrupted
22:14 JoeJulian I've done that many times and never experienced that, and I do replica 3, so there must be some other variable.
22:14 Jerderwerk but i think that falls under split-brain which gluster might not deal with
22:15 Jerderwerk i'm wondering if i'm doing something wrong, since it seems like a normal use case
22:15 Jerderwerk i've got /export/brick1 on the nodes as a seperate partition, formatted with xfs, i'm using ubuntu 12.10 with 3.5.0 kernerl
22:15 Jerderwerk using this ppa: https://launchpad.net/~semiosis​/+archive/ubuntu-glusterfs-3.3
22:15 glusterbot <http://goo.gl/7ZTNY> (at launchpad.net)
22:16 semiosis o_O
22:16 Jerderwerk does any of that sound wrong?
22:16 JoeJulian I suggest you file a bug report with your complete test sequence and see if anybody can duplicate the results.
22:16 glusterbot http://goo.gl/UUuCq
22:17 Jerderwerk ok, i will do a quick write-up
22:18 Shdwdrgn joined #gluster
22:19 portante joined #gluster
22:21 duerF joined #gluster
22:28 genewitch how do i flush all old data from a gluster node?
22:28 genewitch like what folder is all the saved data in so i can just wipe it
22:29 lh joined #gluster
22:29 lh joined #gluster
22:29 genewitch worst case i can just destroy this VM and start a new one, but that is annoying
22:30 JoeJulian /var/lib/glusterd
22:37 genewitch thanks JoeJulian!
22:37 genewitch now if i could get in the VPN
22:37 JoeJulian btw... that's a good general rule-of-thumb to know. According to the filesystem hierarchy standard (FHS) program state is expected to be found under /var/lib for any program.
22:40 genewitch i was digging around in /var prior to everything going down...
22:49 ash13 left #gluster
22:49 ash13 joined #gluster
22:52 sjoeboo_ joined #gluster
23:10 mohankumar joined #gluster
23:37 y4m4 joined #gluster
23:41 robo joined #gluster
23:57 duerF joined #gluster
23:59 hagarth joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary