Camelia, the Perl 6 bug

IRC log for #gluster, 2013-06-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:36 thisisdave joined #gluster
00:41 ccha joined #gluster
00:44 Deformative joined #gluster
00:56 bala joined #gluster
01:00 jbrooks joined #gluster
01:10 kevein joined #gluster
01:24 mooperd joined #gluster
01:25 bulde joined #gluster
01:57 bulde joined #gluster
02:04 forest joined #gluster
02:11 bulde joined #gluster
02:22 forest joined #gluster
02:27 bharata joined #gluster
02:27 glusterbot joined #gluster
02:32 jbrooks joined #gluster
02:39 mohankumar joined #gluster
02:47 hagarth joined #gluster
03:05 jag3773 joined #gluster
03:39 bulde joined #gluster
03:41 itisravi joined #gluster
04:01 rjoseph joined #gluster
04:03 CheRi joined #gluster
04:05 badone joined #gluster
04:09 NcA^ joined #gluster
04:15 fidevo joined #gluster
04:30 mohankumar joined #gluster
04:34 jiku joined #gluster
04:44 shireesh joined #gluster
04:45 shireesh joined #gluster
04:54 lalatenduM joined #gluster
04:56 sgowda joined #gluster
05:09 krishna_ joined #gluster
05:16 jbrooks joined #gluster
05:21 aravindavk joined #gluster
05:26 anands joined #gluster
05:36 rastar joined #gluster
05:37 rastar1 joined #gluster
05:38 rastar1 joined #gluster
05:55 bala joined #gluster
05:55 ngoswami joined #gluster
05:57 glusterbot New news from newglusterbugs: [Bug 976641] RDMA mount fails with hang with transport type RDMA only <http://goo.gl/xX6Mw>
06:03 satheesh joined #gluster
06:04 dowillia joined #gluster
06:11 hagarth joined #gluster
06:14 jbrooks Oooh, this is the same issue I was hitting w/ my wacky soft-iwarp tests
06:14 jbrooks Maybe there's potential life there after all?
06:15 bala joined #gluster
06:17 ricky-ticky joined #gluster
06:21 jtux joined #gluster
06:22 satheesh joined #gluster
06:25 mmalesa joined #gluster
06:25 mmalesa joined #gluster
06:31 vshankar joined #gluster
06:31 bulde joined #gluster
06:31 vpshastry joined #gluster
06:33 krishna_ joined #gluster
06:38 mohankumar joined #gluster
06:38 ctria joined #gluster
06:39 saurabh joined #gluster
06:46 vimal joined #gluster
06:46 krishna__ joined #gluster
06:58 glusterbot New news from newglusterbugs: [Bug 975599] enabling cluster.nufa on the fly does not change client side graph <http://goo.gl/CTk2y>
07:00 forest joined #gluster
07:03 raghu joined #gluster
07:09 dowillia joined #gluster
07:17 hchiramm__ joined #gluster
07:21 ramkrsna joined #gluster
07:31 rastar joined #gluster
07:33 nIMBVS joined #gluster
07:34 nIMBVS hi. the "GlusterFS Concepts" page doesn't display the images. Instead of images the following text is shown: "Error creating thumbnail: Unable to save thumbnail to destination"
07:34 nIMBVS this is the page: http://www.gluster.org/community/docum​entation/index.php/GlusterFS_Concepts
07:34 glusterbot <http://goo.gl/cmDTp> (at www.gluster.org)
07:44 DEac- joined #gluster
07:56 satheesh joined #gluster
07:58 bulde joined #gluster
07:59 tziOm joined #gluster
08:07 hagarth joined #gluster
08:10 dowillia joined #gluster
08:18 sgowda joined #gluster
08:30 spider_fingers joined #gluster
08:31 forest joined #gluster
08:36 Staples84 joined #gluster
08:44 andreask joined #gluster
08:56 mmalesa joined #gluster
08:58 glusterbot New news from newglusterbugs: [Bug 905933] GlusterFS 3.3.1: NFS Too many levels of symbolic links/duplicate cookie <http://goo.gl/YA2vM> || [Bug 916375] Incomplete NLMv4 spec compliance <http://goo.gl/kjBbB> || [Bug 959477] nfs-server: stale file handle when attempting to mount directory <http://goo.gl/28GOf> || [Bug 960141] NFS no longer responds, get "Reply submission failed" errors <http://goo.gl/RpzTG> || [Bug 9624
09:10 dowillia joined #gluster
09:15 Nagilum_ joined #gluster
09:32 Nagilum_ hmm, I have a glusterfs mounted on a client, when I try to create a file in a certain directory it will fail due to "0-gv01-replicate-10: failing create due to lack of quorum"
09:33 Nagilum_ I have the same glusterfs mounted on another node and can create the file in the same directory without issue
09:33 Nagilum_ is there a way to "heal" the mount without unmounting?
09:34 Nagilum_ mount -o remount isn't supported it would seem
09:39 ramkrsna joined #gluster
09:39 ramkrsna joined #gluster
09:40 Nagilum_ the glusterfs(8) man page also references a couple of other glusterfs related manpages which I can't find anywhere
09:41 deepakcs joined #gluster
09:49 vpshastry1 joined #gluster
09:51 rjoseph joined #gluster
09:52 andreask joined #gluster
09:54 sgowda joined #gluster
09:57 saurabh joined #gluster
10:10 dowillia joined #gluster
10:17 vpshastry joined #gluster
10:22 manik joined #gluster
10:24 krishnan_p joined #gluster
10:42 edward1 joined #gluster
11:00 hjwp joined #gluster
11:04 hjwp hi all!  am looking for performance/tuning advice.  am looking at gluster as a replacement for an nfs fileserver, which is becoming a bit of a bottleneck
11:05 hjwp there are about half a dozen clients, all EC2 boxes, and (currently) one server
11:05 hjwp am thinking about doing a combination of striping and replication
11:05 hjwp but want to know what i can do in terms of client-side caching
11:05 hjwp something that emulates a sort of copy-on-read maybe?
11:09 kkeithley joined #gluster
11:11 dowillia joined #gluster
11:14 manik joined #gluster
11:22 saurabh joined #gluster
11:26 Norky you probably don't want striping, rather distribution & replication
11:27 Rocky__ joined #gluster
11:27 andreask joined #gluster
11:29 rjoseph joined #gluster
11:34 hjwp @Norky -- thats probably what I meant
11:34 dowillia joined #gluster
11:34 cakes joined #gluster
11:34 hjwp I'm going to start by just swapping out NFS for gluster, on a single server
11:35 hjwp I did a few experiments a week or so ago, with 2 servers replicated, and found write perf. better than nfs, write perf *much* worse.
11:35 hjwp but i assume there is some tuning i can do...
11:37 hjwp hm, ubuntu default is 3.2.  I assume I'll want to upgrade to 3.3?
11:37 andreask yes, and use more than two servers is also a good idea
11:42 hjwp @andreask -- how come?  i assume perf. degrades with every replicated server... do you mean additional distributed servers?
11:42 andreask hjwp: distributed-replicatet, yes
11:46 rcheleguini joined #gluster
11:47 CheRi joined #gluster
11:54 hjwp am going to start with just one client and one server
11:54 hjwp i want to see if i can setup caching on the client
11:54 hjwp and what that does to performance
11:57 hjwp do i have to setup a config file for that on the client?  or on the server?  or both?
11:59 glusterbot New news from newglusterbugs: [Bug 969461] RFE: Quota fixes <http://goo.gl/XFSM4>
12:01 hagarth joined #gluster
12:17 hjwp assuming it's just server-side config, and that filters down to clients...
12:17 hjwp yuk, gluster read perf. sucks compared to nfs so far.
12:24 hybrid5121 joined #gluster
12:28 manik joined #gluster
12:29 hjwp ok, if i understand things correctly, i can tweak the settings on the server using `gluster volume set` and they affect the client immediately
12:29 hjwp if i reduce the cache to 0 on the server, that worsens client perf...
12:29 hjwp seems no need to remount
12:32 aliguori joined #gluster
12:34 hjwp it's still nowhere near the performance of the local filesystem
12:35 krokarion joined #gluster
12:38 manik joined #gluster
12:38 hjwp or NFS
12:40 krokar joined #gluster
12:48 mmarcin left #gluster
12:50 mooperd joined #gluster
12:52 kkeithley_ So, NFS isn't as fast as local file system either. And when NFS can do what GlusterFS does, I'd wager real money it won't be as fast as it is now either. ;-)
12:53 hjwp but once nfs caches
12:53 hjwp it is as fast as the local filesystem
12:55 kkeithley_ I'll say it again: when NFS can do what GlusterFS does, I'd wager real money it won't be as fast as it is now either.
12:56 kkeithley_ Use the GlusterFS NFS then, this will allow the client will do more caching.
12:57 hjwp here's my example with reading from a 50MB file:  first hit NFS=2s first hit gluster=4s, second hit NFS=20ms, second hit gluster=100ms
12:58 hjwp for the local filesystem, numbers are first hit 60ms second hit 20ms
12:58 Guest2858 joined #gluster
12:59 hjwp perhaps i'm being naive but, i mean, a cache is a cache right?
12:59 hjwp i'm totally prepared to pay a penalty for some things in return for lots of distributed/redundancy gluster awesomeness
13:01 hjwp but i hoped that cached reads would be at least in the same ballpark as nfs...
13:01 hjwp is there something I'm missing in terms of perf/cache tuning?
13:03 piotrektt joined #gluster
13:04 kkeithley_ Try mounting the volumes on the clients using NFS. Unless you've explicitly disabled NFS, gluster runs an NFS server too. Then the clients will do more caching.
13:04 kkeithley_ And you should see an improvement on the reads
13:05 rwheeler joined #gluster
13:06 ndevos kkeithley_: has there ever been an interest in using fscache with glusterfs/fuse ? I guess it should not be too difficult to add that to the fuse module
13:13 hjwp thanks, will give it a try...  although last time i did, switching to nfs improved read speed but killed write speed!  you can't win...  tanstaafl
13:20 hjwp btw, am just re-reading the chat logs, sorry kkeithley_, i promise you won't have to tell me to do *everything* twice...
13:21 kkeithley_ no worries
13:23 kkeithley_ ndevos: Yes, there's definite interest.
13:25 ndevos kkeithley_: cool, no idea how much work that is, maybe I'll check it out when I find some spare time
13:50 zetheroo joined #gluster
13:51 forest joined #gluster
13:52 mooperd joined #gluster
13:53 failshell joined #gluster
13:54 manik joined #gluster
13:57 jack joined #gluster
13:59 bulde joined #gluster
13:59 hjwp sure enough, perf under NFS is much better, at least for read
13:59 glusterbot New news from newglusterbugs: [Bug 976800] running dbench results in leaked fds leading to OOM killer killing glusterfsd. <http://goo.gl/kDARL>
14:00 hjwp write is a little faster for single files, but *much* slower (than the gluster native client) for multiple files
14:01 hjwp I'm a make a spreadsheet
14:04 failshell i just stumbled upon unified file and object storage
14:04 failshell omg
14:15 spider_fingers left #gluster
14:19 jthorne joined #gluster
14:22 bsaggy joined #gluster
14:33 forest joined #gluster
14:34 joelwallis joined #gluster
14:35 dbruhn Weird issue this morning, unable to remove or even ls a directory with some files that are apparently messed up on 3.3.1 TCP/IP, here is the volume log, any ideas? http://pastie.org/8066641
14:35 glusterbot Title: #8066641 - Pastie (at pastie.org)
14:37 dbruhn Here is the command line output while I am trying to do something with it. http://pastie.org/8066649
14:37 glusterbot Title: #8066649 - Pastie (at pastie.org)
14:42 forest joined #gluster
14:53 johnmark failshell: :)
14:53 johnmark failshell: it feels like we've been talking about that for a long time, but apparently not enough :)
14:55 jthorne joined #gluster
14:58 failshell just showed that to our chief architect
14:58 failshell he had a nerdgasm
15:00 hchiramm_ joined #gluster
15:08 hjwp here's some numbers, in case anyone is interested.  comments appreciated.  am about to try extending to 2 servers in the cluster...  https://docs.google.com/spreadsheet/ccc?key=0Ah​hyVcO7qVAZdGs1ZkpSY1FjMmZ1TUh0WkRLVkEyT0E#gid=0
15:08 glusterbot <http://goo.gl/pmyjn> (at docs.google.com)
15:09 JoeJulian dbruhn: Check your bricks and brick logs.
15:10 JoeJulian hjwp: What's your use case?
15:11 hjwp JoeJulian: currently we have a cluster of about half-a-dozen servers which all talk to a single fileserver
15:11 hjwp using NFS.  we want to remove the bottleneck and the SPOF
15:12 dbruhn Idea for a feature, a combined brick log
15:12 hjwp my ideal distributed filesystem would have a local cache and some kind of eventual consistency back to the cluster...
15:12 hjwp although that opens up the potential for lots of data integrity problems
15:13 hjwp but something that gets as close as possible to local performance wld be great
15:14 hjwp often it's just one client that needs to access a given file/directory at any one time...
15:14 hjwp especially for write access
15:14 johnmark failshell: lulz...
15:14 * johnmark not sure he wants to see a nerdgasm
15:15 jag3773 joined #gluster
15:15 mooperd joined #gluster
15:15 failshell im working on integrating RHS to our Spacewalk stack
15:15 failshell and then, im gonna have to figure out how to improve write speed a bit
15:15 failshell there has to be a bit of kernel tweaking i can do
15:16 bennyturns joined #gluster
15:20 hchiramm_ joined #gluster
15:26 aliguori_ joined #gluster
15:30 bsaggy_ joined #gluster
15:32 zaitcev joined #gluster
15:35 zetheroo left #gluster
15:42 forest joined #gluster
15:49 nightwalk joined #gluster
15:51 steve_ joined #gluster
15:52 Deformative joined #gluster
15:54 steve_ Hello, when gluster client hangs, i still see the volume mounted. how could i check the gluster status on client side? i only found tons of scripts to check glusterfsd status
15:54 mmalesa joined #gluster
15:55 lalatenduM joined #gluster
15:55 Eco_ joined #gluster
16:03 neofob On ESXi, the network throughput is is good, for one or two users for my case ~100MB/s
16:03 neofob http://bit.ly/11S9NGV
16:07 stopbit joined #gluster
16:09 bala1 joined #gluster
16:16 Mo__ joined #gluster
16:17 hagarth joined #gluster
16:26 itisravi joined #gluster
16:28 hjwp hmmm
16:29 hjwp how do i add a new replica server+brick to the cluster?
16:29 hjwp add-brick is unhappy...
16:29 hjwp "supplied 1 with count 2"
16:30 steve_ left #gluster
16:31 hjwp i'm beginning to guess that i have to stop the volume?
16:37 semiosis hjwp: ,,(pasteinfo)
16:37 glusterbot hjwp: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
16:37 semiosis you shouldn't have to stop the volume
16:38 hjwp https://dpaste.de/cFVMF/
16:38 glusterbot Title: dpaste.de: Snippet #232193 (at dpaste.de)
16:38 hjwp i think if i could drop the replica count to 1 temporarily
16:39 hjwp then maybe i could add the new brick
16:39 hjwp and then set replica count back up to 3
16:39 hjwp i want 1 brick per server, and 1 replica per server...
16:39 semiosis that sounds complicated
16:39 semiosis whats your end goal here?
16:39 hjwp am doing some perf. investigations
16:39 hjwp https://docs.google.com/spreadsheet/ccc?key=0Ah​hyVcO7qVAZdGs1ZkpSY1FjMmZ1TUh0WkRLVkEyT0E#gid=0
16:39 glusterbot <http://goo.gl/pmyjn> (at docs.google.com)
16:40 hjwp it looks like clients that are also replica servers have better i/o performance
16:40 hjwp but i understand that the more replicas you have, the worse performance is
16:40 hjwp i want to see how the tradeoff pans out
16:40 semiosis so you want to run a test with different replica counts, i see
16:40 hjwp i don't actually mind stopping the volume and deleting
16:41 hjwp i was just curious to see if i could do it online...
16:41 semiosis you should be able to change replica count by doing add-brick replica $newcount $extrabrick
16:41 semiosis what version of glusterfs
16:41 hjwp cool, will try that now
16:41 semiosis >
16:41 hjwp thanks!
16:41 semiosis ?
16:41 hjwp 3.3
16:41 semiosis ok should work
16:41 hjwp from your ppa, if i'm reading your username correctly?
16:41 semiosis that's me
16:42 semiosis @ppa
16:42 glusterbot semiosis: The official glusterfs 3.3 packages for Ubuntu are available here: 3.3 stable: http://goo.gl/7ZTNY -- 3.3 QA: http://goo.gl/5fnXN -- and 3.4 QA: http://goo.gl/u33hy
16:42 semiosis you might want to try 3.4
16:42 semiosis bbiab
16:43 * hjwp feels honoured
16:43 tjstansell joined #gluster
16:51 Deformative joined #gluster
16:53 vpshastry joined #gluster
16:54 vpshastry left #gluster
16:54 bulde joined #gluster
17:02 forest joined #gluster
17:05 bit4man joined #gluster
17:22 satheesh joined #gluster
17:37 portante joined #gluster
17:41 hjwp ouch. sure enough, the penalty you pay for replicating across 3 servers more than outweighs the bonus of having the local replica:
17:41 hjwp https://docs.google.com/spreadsheet/ccc?key=​0AhhyVcO7qVAZdGs1ZkpSY1FjMmZ1TUh0WkRLVkEyT0E
17:41 glusterbot <http://goo.gl/WR5GU> (at docs.google.com)
17:53 hlieberman joined #gluster
17:54 hlieberman We've got a cluster where the stat() and lstat() calls are taking forever.  Seconds.  ls is instant, ls -l takes a long time, even with only a handful (~25) files in the directory.
17:54 hlieberman Any ideas?
17:56 JoeJulian I'm assuming that "forever" in your use does not actually equate with the dictionary definition.
17:56 hlieberman Forever in computer terms - seconds.
17:56 JoeJulian 25 entries takes how many seconds?
17:57 JoeJulian Also, which version?
17:57 hlieberman For 12 files, it takes about 40 seconds.
17:57 hlieberman 3.4-beta3.
17:57 JoeJulian wow
17:57 JoeJulian Any clues in the logs?
17:58 hlieberman There doesn't seem to be anything in the logs at all.  Maybe we have some debug things we can turn on?
17:58 JoeJulian is your log partition full?
17:58 hlieberman Nope.
17:58 hlieberman I mean, anything useful.
17:58 hlieberman Sorry, I need to be more precise to be useful.
17:58 JoeJulian Heh, ok.
17:59 hlieberman Hang on.
17:59 hlieberman May have found something.
18:01 purpleidea joined #gluster
18:01 purpleidea joined #gluster
18:05 phox joined #gluster
18:06 phox hey.  finding that trying to get a directory listing for a dir containing ballpark 20k files on a filesystem I'm doing some other light stuff on is taking multiple minutes to even start producing output.  anything I can do about this?
18:07 hlieberman phox, Are you using bash?
18:07 JoeJulian I like to split them up using the first couple characters of the filename.
18:07 hlieberman phox, Try /bin/ls /path/to/directory
18:07 phox hlieberman: yes.  ok, I'll see how that does.
18:08 hlieberman phox, You may be hitting the same problem as us.
18:08 phox FYI:  ls is aliased to `ls --color=auto'
18:08 phox from 'type ls'
18:08 hlieberman Right.
18:08 hlieberman Exactly.
18:08 phox hm
18:08 hlieberman If this fixes it, it's the lstat() issue.
18:08 * phox trying /bin/ls now
18:08 phox doesn't seem very responsive yet
18:09 * phox adds 'time' to that to see if/when it finishes
18:09 JoeJulian Which isn't an "issue" per-se, but rather a result of the overhead of ensuring consistency when replicating across a network.
18:10 phox no replication in this case, just multiple clients
18:10 hlieberman Well... 12 files -> 40 seconds is an issue. ;)
18:10 JoeJulian hlieberman: Are you replicating across high latency connections?
18:10 hlieberman JoeJulian, over IB
18:10 phox actually, not even multiple clients in this exact case
18:10 phox I have no replication and this is local, FWIW
18:10 hlieberman Maybe not the same, then.
18:11 piotrektt joined #gluster
18:11 phox FWIW these files _were_ recently written out; haven't compared for other dirs that have been there for a while
18:16 Deformative joined #gluster
18:19 phox ok, other dirs do the same crap
18:19 * phox hopes this works eventually otherwise this needs to have gluster taken off the top of it while stuff is being done to it... =/
18:27 CROS_ joined #gluster
18:27 CROS_ Hey guys, is there any documentation on the differences of NFS vs the fuse client?
18:28 CROS_ I noticed in the docs it said that nfs doesn't have automatic failover? Not sure how the fuse client does, though...
18:29 phox the fuse client connects to a gluster server gluster
18:30 phox there is no NFS client; it's just "whatever other NFS client you connect with" and NFS doesn't have failover, so...
18:30 lpabon joined #gluster
18:30 phox so yes the proper client has failover and NFS does not
18:30 phox for some values of "proper"
18:30 phox and once Gluster re-grows RDMA support, FUSE will stop being such a sucky option as there will be far fewer context switches involved
18:31 phox hlieberman: so yeah like 5 minutes later it decided to start listing files for me... yay =/
18:31 phox all the same at least it will punish the people around here who have like 20,000 stupid little tiny retarded files instead of one big 3D or 4D data file :)
18:32 CROS_ So, fuse client connecting to gluster server cluster... Doesn't it connect to a single node to get through to the cluster?
18:32 JoeJulian CROS_: the fuse client connects directly to all the servers. If you use replication, that allows the client to continue accessing the files when a server is down.
18:32 phox CROS_: initially but it does discover the rest of the cluster and is not tied to a single node thereafter
18:32 CROS_ ah, so just in start up it needs that mount server to be there, eh?
18:32 phox obviously if you have it in your fstab and that _one_ server is down it's not gonna connect, but once it's up...
18:33 CROS_ got'cha. makes sense now
18:33 JoeJulian Actually, it needs any ,,(mount server)
18:33 glusterbot (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
18:33 phox not sure if you can tell it about multiple servers when mounting
18:33 CROS_ hmm, page not found on the bot
18:33 JoeJulian gah
18:34 bit4man joined #gluster
18:34 CROS_ Is it going to tell me that I can specify multiple mount servers in fstab?
18:34 JoeJulian @forget "mount server" 2
18:34 glusterbot JoeJulian: The operation succeeded.
18:34 CROS_ Or just that each client can have their own mount server?
18:35 JoeJulian There is a switch for that... backup-server or something like that. I just use rrdns.
18:36 CROS_ ah, okay
18:36 CROS_ makes sense
18:37 CROS_ Another thing I think I've noticed, when using the fuse client and then doing a crazy directory listing operation or something, the client seems to just go crazy. CPU spikes like nuts and the client becomes unresponsive.
18:37 CROS_ I tried NFS just now and it seemed to handle it okay... It took forever to get the listing back (of like 15k directories), but that's normal. At least the server stayed up.
18:38 CROS_ Is that expected?
18:40 hlieberman JoeJulian, Sorry, still working on it.  I found a /second/ problem which we're fixing.
18:41 JoeJulian cool. I like it when you find your own problems. ;)
18:41 CROS_ Meh, nevermind. Ignore that previous statment/question.
18:42 CROS_ One last question, guys. With an NFS client accessing, will it just hit the server specified in the mount command for the data, and never hit any others? Or will the one hit decide how to forward it on to other ones (2 servers recplciated as an example).
18:42 phox a)
18:42 phox NFS client is not very smart that way, and doesn't support that sort of stuff
18:43 forest joined #gluster
18:43 JoeJulian All nfs communications will go to the server specified on the mount. That server will then communicate through the client protocol, bypassing fuse, to the rest of the servers.
18:43 CROS_ k, that's what I figured
18:43 * phox pokes FUSE in the eye
18:44 JoeJulian If you need failover, you can use a virtual ip.
18:44 CROS_ yeah, that's what I'm thinking
18:44 phox yeah, being stateless there is certainly helpful
18:44 JoeJulian btw... there are patches pending for the kernel to change the behavior of fuse with directory listings.
18:44 phox heh, cute
18:45 phox so maybe in some new version, which works for us 'cause we're running bleeding-edge because we can
18:45 CROS_ The other thing (and this may be kind of weird), but if you have NFS client on one of the replicated servers, and you set the mount call on that server/client to point to itself, will it serve all data out from itself by default? Or will it still go to server2 for data as well?
18:45 phox but yeah, mostly having fewer things have to context switch would be really nice
18:45 phox the client still knows exactly what it would if it were on another machine
18:46 phox so it still hits the local NFS server and that still behaves exactly as in the other case above
18:46 CROS_ k
18:46 CROS_ figured that was the case
18:46 CROS_ Well, this makes things a lot clearer...
18:46 * phox does not like 20,000 files in a directory
18:46 CROS_ haha
18:46 phox should not be a problem AT ALL but apparently it is
18:46 JoeJulian That's one of the advantages of the new libglfsapi. Things like qemu, samba, hadoop (soonish), etc. can interface directly with the client without going through fuse.
18:47 CROS_ I have 10k directories in a directory, but never have to actually do a lookoup
18:47 CROS_ listing*
18:47 phox heh
18:47 phox there are some other directories around here with 40M files in a tree because people write stupid code
18:47 CROS_ I will reshuffle it to shallower directories eventually. -_-
18:47 phox people = "scientists"
18:47 CROS_ haha
18:47 CROS_ 40m holy crap
18:48 phox I suppose "let's make a disaster and see what happens" is scientific
18:48 JoeJulian if you can avoid doing listings/lookups on every item in a directory, it shouldn't be a problem. It's only when you stat 12k files, triggering lookup() on each of them which has to wait for self-heal checks...
18:48 CROS_ Actually, I do have another question. =]
18:48 phox but then they need a control case.  and then I can delete their non-control case and all will be well :)
18:49 CROS_ My architecture/needs are possibly a little weird. But, anyway, I currently just have to servers set up that replicate each other. All is well. However, they are basically file servers for serving out some pretty massive files/bandwidth. To reduce the amount of network saturation, I'd like to just serve the files off the machine that I hit...
18:49 JoeJulian That's how I should make my 2nd million (I'm trying for the 2nd because it's supposed to be easier than the first)... filesystem training for scientists.
18:50 CROS_ So, an example. I have data-01 and data-02. I tell whoever needs to get a file to go to data-01 to get their file. They go there and I would like to serve out that file from the machine.
18:50 phox JoeJulian: as a venture capitalist I offer you this clue-by-four.  I expect appropriate ROI within the year.
18:50 CROS_ data-01 gluster decides to serve from data-02, then I have some private network saturation when data-01 actually has the file
18:51 CROS_ And, so, since I'm cheap and don't yet want to build out better networking between the two, I just serve out from the actual brick's /export directory. But then self-heals don't work. haha
18:51 phox saturation?  go buy some IB hardware :)
18:52 CROS_ *ahem* cheap. =]
18:52 hlieberman We saturate our QDR IB fabric. >.>
18:52 hlieberman Just not... you know.  ls.
18:53 CROS_ It works fine. The only problems is that it doesn't support any sort of failover for the actual serving of the files (for writes I go through the mount so that should failover just fine and not lose data. And if data-02 goes down, and then comes back up, data-01 has the files that changed since data-02 was down.
18:54 CROS_ So, to self-heal I need to just crawl through all files, right?
18:55 dbruhn grr I have a corrupted brick in a distributed-replicated system. What's the process for adjusting the brick, formatting it, and then rebuilding it
18:58 jclift_ joined #gluster
19:01 nordac joined #gluster
19:01 lbalbalba joined #gluster
19:01 phox emptying the brick = delete stuff from it, unless you mean the underlying FS is corrupted
19:01 phox as to actually making gluster let go of it then re-use it, dunno
19:01 phox BBL
19:03 dbruhn The underlying FS is corrupted
19:03 JoeJulian dbruhn: That's what I would do.
19:03 dbruhn I need to format it, get it back into a constant state and then have gluster repopulate it from it's replicant
19:04 dbruhn so just format it and gluster will repopulate it?
19:04 JoeJulian I'd kill the server for that brick, do that, then gluster volume start $vol force
19:05 dbruhn the server is offline right now, but the system is still up and running. I am assuming I need to stop the whole system before issuing the force command?
19:06 lbalbalba hi. you guys bust at the moment ?
19:06 JoeJulian dbruhn: Nope, force will just force it to start that brick that's not running.
19:07 dbruhn There are no commands that need to be run after to tell it to check for all the missing directories/files?
19:07 dbruhn Sorry for all the questions, just making sure I totally understand
19:07 JoeJulian dbruhn: I would probably do a heal...full
19:07 dbruhn ok
19:07 dbruhn should I be turning quorum on for 3.3.1?
19:08 JoeJulian That's a use-case decision.
19:08 CROS_ left #gluster
19:08 JoeJulian lbalbalba: This here's a jam for all the fellas, tryin
19:08 JoeJulian damn... missed the apostrophe destroying what might have been a funny gag.
19:09 dbruhn lol
19:10 lbalbalba when running 'gluster volume create' on 'host2', im getting this error: "volume create: patchy: failed: Host host1.localdomain is not in 'Peer in Cluster' state". 'host1' used to be part of the trusted pool, but currently is no longer.
19:10 lbalbalba i tried removing /var/lib/glusterd, i tried 'gluster peer detach host1.localdomain', but that doesnt fix it.
19:11 JoeJulian lbalbalba: my guess would be that host1 was part of an existing volume.
19:12 dbruhn the same principle from joe's split brain article cleans it up
19:12 dbruhn http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/
19:12 glusterbot <http://goo.gl/FPFUX> (at joejulian.name)
19:12 dbruhn if I remember
19:12 dbruhn wait no
19:12 dbruhn sorry
19:12 JoeJulian dbruhn: I was expecting that too. :D
19:13 dbruhn lol
19:13 dbruhn Can you start predicting my raid controller failures from Dell too?
19:14 JoeJulian maybe... are you using smartd? ;)
19:15 dbruhn I had am on my third controller in this single server, seems to be stable now, but the mess of server crashes from the resulting failed controller? *sigh*
19:16 lbalbalba 'gluster volume status' returns 'No volumes present'. i tried removing the entire path im using in the 'create volume' command. doesnt fix the issue
19:16 cfeller joined #gluster
19:18 ingard JoeJulian: do you have any good pointers for performance tuning for a write heavy environment?
19:18 ingard atm it seems the default generated config makes things very slow
19:19 ingard flushing to disk can take 20+ secs
19:20 JoeJulian low latency connections, the lower the better, deadline or noop on the servers (depends on use and load. Test to see which is best for you.)
19:20 JoeJulian IB RDMA if you can.
19:21 failshell wish we'd go with IB
19:22 failshell 1Gbps is not super
19:23 ingard we're stuck with 1gbE
19:23 JoeJulian how much data are you flusing that it takes 20 seconds?
19:24 ingard we flush max 4mb buffers
19:24 ingard but its not any quicker if the buffer itsnt full
19:24 ingard even stating/listing files can take that long
19:24 JoeJulian Should be around a gig for that much latency on a write.
19:24 ingard indeed
19:24 ingard each client writes
19:24 ingard lets see
19:24 JoeJulian Unless you're replicating to a lot more that 2 bricks.
19:25 JoeJulian s/that/than/
19:25 glusterbot What JoeJulian meant to say was: Unless you're replicating to a lot more than 2 bricks.
19:25 ingard hehe
19:25 ingard neat
19:25 ingard this is not with replicating
19:25 ingard but distributed 6 servers with 4 bricks each
19:26 * JoeJulian raises an eyebrow...
19:26 JoeJulian What version?
19:27 ingard 3.0.5
19:27 * JoeJulian beats ingard with a version stick.
19:27 ingard i know i know
19:27 ingard we've had the version discussion before if u remember
19:27 ingard 6PB etc etc
19:27 JoeJulian yeah
19:28 ingard but it hasnt always been this slow
19:28 JoeJulian I just forget which versions get attached to which nicks. :D
19:28 jclift_ 6PB and 1GbE.
19:28 jclift_ Ouch. :(
19:29 cfeller Gluster performance setting question (setting up some new hardware):  On my bricks, each brick will be RAID 10, behind a PERC H700 RAID controller.  On the RAID controller, is there a performance difference based off of the read ahead policy?  (e.g., if I set "no read ahead", or "adaptive read ahead", or "read ahead"?)
19:29 JoeJulian Check the client logs and the brick logs. Could be a bug, could be a brick... any all the other hardware possibilities...
19:29 JoeJulian cfeller: That's another use-case dependent question.
19:30 ingard this problem has gradually increased which makes me think its something i could tweak
19:32 ingard we've changed from 3tb drives to 4tb drives a while back which might have been related i guess
19:32 ingard as in more space on each gluster mountpoint
19:32 ingard but still, for instance the io-threads setting
19:32 cfeller JoeJulian: ok so gluster doesn't prefer one or the other, the answer would be largely the same as if I wasn't using gluster, is what you're saying, correct?
19:32 ingard should it be multiplied by how many clients we've got
19:32 ingard or how many concurrent writes we want to support
19:32 ingard ?
19:33 ingard jclift_: we've not got all of it on one buig glusterfs tho, its spread around 21 different ones atm
19:33 vdrmrt joined #gluster
19:33 JoeJulian cfeller: pretty much, yes, with the exception possibilities being that the filesystem may be accessed by multiple clients. That may negate some read-ahead benefits.
19:34 cfeller JoeJulian: ok, thanks!
19:35 Deformative joined #gluster
19:42 ingard nothing? c'mon :) JoeJulian you usually got something :)
19:42 z2013 joined #gluster
19:42 vdrmrt Hello, I'm trying to mount a gluster volume on a gluster server in fstab in ubuntu but keep getting errors with rpcbind
19:42 ingard $ ping 10.0.30.182
19:42 ingard PING 10.0.30.182 (10.0.30.182) 56(84) bytes of data.
19:42 ingard 64 bytes from 10.0.30.182: icmp_seq=1 ttl=64 time=0.177 ms
19:42 ingard 64 bytes from 10.0.30.182: icmp_seq=2 ttl=64 time=0.666 ms
19:42 ingard 64 bytes from 10.0.30.182: icmp_seq=3 ttl=64 time=0.271 ms
19:42 JoeJulian hehe, sorry, working on another time-critical issue at the moment....
19:42 ingard 64 bytes from 10.0.30.182: icmp_seq=4 ttl=64 time=0.076 ms
19:42 ingard ^C
19:42 ingard --- 10.0.30.182 ping statistics ---
19:42 ingard 4 packets transmitted, 4 received, 0% packet loss, time 2998ms
19:42 ingard rtt min/avg/max/mdev = 0.076/0.297/0.666/0.224 ms
19:42 ingard i dont think that latency is too bad tho
19:43 ingard (from client to storage node)
19:43 JoeJulian Careful... glusterbot will kick you for flooding.
19:43 ingard right
19:43 * ingard pets glusterbot
19:44 hlieberman JoeJulian, OK.
19:44 JoeJulian If there's no clues in the logs, I'd probably look at the network traffic with wireshark or memory use.
19:44 hlieberman JoeJulian, I remounted the entire system with tcp transport.
19:46 vdrmrt i'm running gluster 3.3.1 on ubuntu 12.04
19:46 hlieberman And I'm checking the logs on the gluster volume servers and on the client server.  Nothin'.
19:47 vdrmrt with the ppa packages from semiosis
19:47 dbruhn is there some way to force gluster to release the log files, I tried the log rotate but it didn't seem to work
19:47 JoeJulian vdrmrt: is something not working?
19:47 dbruhn and then I deleted the log and it's stuck locked taking up disk space
19:47 JoeJulian dbruhn: I use copytruncate
19:47 JoeJulian Though I think a HUP will do it too.
19:47 dbruhn sweet thanks
19:48 vdrmrt JoeJulian: when the server boots I get a message that it cannot mount and an error from rpcbind it can't open rpcbind.xdf
19:49 vdrmrt cdr
19:49 vdrmrt xdr
19:49 vdrmrt sry
19:49 vdrmrt ubuntu ask to skip or manual recover
19:49 JoeJulian vdrmrt: Are you mounting via NFS?
19:49 vdrmrt nope glusterfs
19:50 lbalbalba looks like i was the one woth the split-brain. nevermind glsusterd. cut n past ing too much without looking at the full cmd line ... :oops:
19:50 JoeJulian lbalbalba: hehe
19:50 ingard JoeJulian: i guess i need to look at tcpdumps, but I kinda wanted to have a look at the gluster config file and the different xlators first
19:50 ingard i kinda expect that the default generated configs wont be the perfect fit for my setup
19:51 vdrmrt JoeJulian: my fstab entry 127.0.0.1:gv0                                   /mntpoint     glusterfs       defaults        0       0
19:51 JoeJulian ingard: if I were to just guess, I'd randomly point my finger at memory use.
19:52 JoeJulian @options
19:52 glusterbot JoeJulian: see the old 3.2 options page here http://goo.gl/dPFAf or run 'gluster volume set help' on glusterfs 3.3 and newer
19:53 JoeJulian ingard: but with 3.0, you're kind-of screwed on documentation. The best documentation for that version is the source code.
19:53 JoeJulian You can find all the xlator options at the bottom of the main c file for each xlator.
19:55 ingard memory use seems to be stable tho
19:55 ingard i'm monitoring the memory usage of both the clients and the servers and they're not out of line at all
19:55 JoeJulian vdrmrt: I don't know of any reason that rpcbind should be a problem unless you're using nfs.
19:55 ingard depending on what is expected obviously, but its not growing or anything like that
19:56 JoeJulian ingard: with everything else being the same, the only possible conclusion is that you're encountering a time paradox. Try rebooting your tardis.
19:56 ingard damnit
19:56 vdrmrt JoeJulian: I'm also lost
19:56 ingard i hoped you wouldnt say that :P
19:56 ingard and documentation wise i'm so so screwed
19:57 ingard but i do find blog posts and some other semi useful info in mailinglists etc
19:57 JoeJulian I think some of the old stuff can still be found in searches on the wiki.
19:57 ingard i did find a reference to a bug in the read-ahead xlator for 3.0.5 and a link to the old bugtracker but that link is broken
19:57 vdrmrt JoeJulian: maybe the rpcbind error is not related but my mount fails for an other reason that doesn't show an error when booting
19:58 JoeJulian Do you have the old bug id?
19:58 ingard i do
19:58 ingard hold on
19:58 vdrmrt can I look for errors in some kind of mount log
19:58 JoeJulian semiosis: ^
19:59 ingard hm
19:59 ingard When the gluster devs fix the quick-read bug in v3.0.5 then the sites will be even quicker.
19:59 ingard ""
19:59 ingard http://www.sirgroane.net/2010/03/t​uning-glusterfs-for-apache-on-ec2/
19:59 * JoeJulian notes that nobody that runs rpm distros ever complains about boot issues... ;)
19:59 glusterbot <http://goo.gl/NcsJk> (at www.sirgroane.net)
19:59 ingard http://bugs.gluster.com/cgi-bin​/bugzilla3/show_bug.cgi?id=762
19:59 glusterbot <http://goo.gl/4EF7F> (at bugs.gluster.com)
19:59 glusterbot Bug 762: low, low, ---, dkl, CLOSED DUPLICATE, Missing x permission on /usr/doc/gimp-manual* directory
20:00 ingard it was for stats-prefetch
20:00 ingard my mistake
20:00 JoeJulian oldbug 762
20:00 glusterbot Bug http://goo.gl/wSovg medium, low, 3.2.0, raghavendra, CLOSED WORKSFORME, Random "No such file or directory" error on gluster client when using stat-prefetch
20:00 ingard ohhhhh
20:00 JoeJulian Nice, eh?
20:00 ingard i didnt know about this feature :)
20:00 * ingard pets glusterbot
20:01 ingard "When the gluster devs fix the quick-read bug in v3.0.5 then the sites will be even quicker."
20:01 ingard i wonder which bug he is referring to here tho
20:03 ingard anyway
20:03 ingard at least i've learnt something new today :)
20:05 vdrmrt JoeJulian: btw when I mount after boot it works without a problem mount /mntpoint
20:07 JoeJulian vdrmrt: Sorry, I'm an rpm user, and have been for a very long time. semiosis is the .deb packaged expert.
20:08 vdrmrt JoeJulian: ok np
20:08 JoeJulian If it was an rpm, I'd point out the missing _netdev mount option, but I don't think you guys do it that way.
20:08 vdrmrt i tried that no effect
20:08 vdrmrt also tried nobootwait
20:09 isomorphic joined #gluster
20:11 vdrmrt btw I'm new to gluster and trying out on some virtual machines and very impressed how easy it is to set up (except from the fstab mounting ofcourse)
20:12 JoeJulian :)
20:12 hlieberman Any ideas what this error indicates: [2013-06-21 20:12:19.521750] E [socket.c:2788:socket_connect] 0-management: connection attempt failed (Connection refused)
20:13 JoeJulian hlieberman: port 24007 is blocked, or glusterd is not running.
20:14 hlieberman It is, and bound to 24007.
20:14 JoeJulian ingard: I just remembered something else... if the log partition is full, things get very slow. Check those.
20:14 hlieberman Maybe we should just bounce it.
20:15 JoeJulian hlieberman: perhaps
20:15 JoeJulian hlieberman: also check iptables of course.
20:15 hlieberman First thing I checked. 0 backets dropped.
20:15 hlieberman ... packets.
20:16 hlieberman Nope.  We tried to restart it, still getting the errors.
20:18 hlieberman Any other clues?  Or, something I can do to check?
20:18 JoeJulian telnet to that port?
20:19 hlieberman It connects, but there's nothing that comes back.
20:19 JoeJulian selinux? apparmor?
20:19 hlieberman Nope.
20:19 JoeJulian router?
20:20 JoeJulian connection refused comes from an icmp reply, so it has to be network related.
20:20 hlieberman I can telnet to it over both interfaces.
20:22 hlieberman Is there something I can pass via telnet to get a response back?
20:22 hlieberman a ping or something?
20:22 dbruhn If I said I just randomly started having an RPC error on on of my servers running two bricks?
20:22 hlieberman We also have NFS disabled.
20:30 ingard JoeJulian: aight thnx, the logs go to var/log/gluster though and they've got plenty of space
20:30 JoeJulian hlieberman: nothing that I know of. If you're connecting via telnet, then that's not where the connection refused is coming from. Perhaps that was before glusterd was started?
20:30 hlieberman JoeJulian, I found it.
20:30 hlieberman It's looking for a socket that's not connected in /var/run
20:30 JoeJulian Ah, ok. I see.
20:31 hlieberman No idea what it's expecting to be connected there.
20:31 JoeJulian I forgot about the named pipes....
20:32 hlieberman There's a couple other sockets there that do have things bound.  Two glusterfsd's for the two bricks and a copy of glusterfs.
20:32 hlieberman Any idea what the last named pipe would be for?
20:35 hlieberman Got it.  It's the NFS thing.
20:35 hlieberman OK.  NFS is started, and now we're getting... 0-gv0-client-4: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.30.10.81:1019 peer:10.30.10.82:49158)
20:39 dbruhn gah I have a brick server that's config has gone out to lunch
20:39 dbruhn [xlator.c:385:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
20:39 dbruhn whats the command to copy the config from a known good config
20:45 ingard oldbug 3011
20:45 jclift_ RDMA Connected Mode Event Rejected
20:45 glusterbot Bug http://goo.gl/jt1TN high, urgent, ---, rgowdapp, CLOSED CURRENTRELEASE, Uninterruptible processes writing(reading ? ) to/from glusterfs share
20:45 * jclift_ knows the words, but has no idea what they mean
20:46 hlieberman Yeah.  We're all scratching our heads over here - IB is working just fine.
20:47 dbruhn is your subnet manager allowing you to do ibping commands between nodes?
20:47 failshell can i leave profiling enabled at all time? or is that going to cause a performance drop?
20:49 jclift_ hlieberman: Which version of gluster is that error coming from?  Not the 3.4.0 beta 3 code?
20:50 forest joined #gluster
20:51 * phox wants RDMA back
20:51 hlieberman Yeah.
20:51 hlieberman Latest - 3.4.0b3.
20:51 hlieberman dbruhn, Checking.
20:52 jclift_ hlieberman: Interesting.
20:52 jclift_ hlieberman: Which OS?
20:52 hlieberman Debian Wheezy amd64.
20:52 jclift_ k
20:53 ingard oldbug 1042
20:53 glusterbot Bug http://goo.gl/B9RAe medium, urgent, 3.0.5, pavan, CLOSED CURRENTRELEASE, Use correct flock structures in lk fops
20:53 jclift_ I'm about to try the 3.4.0 beta 4 rpms on RHEL 6.4 test boxes here.  But, my storage on each node is only a single ssd, so doesn't exactly push the bandwidth hard
20:54 hlieberman Our main complaint isn't throughput, it's the ~40 seconds it takes to run an ls -l on a directory with 14 files in it.
20:54 jclift_ Ugh
20:54 jclift_ Yeah, that sounds like a problem
20:55 ferringb hlieberman: got any rebuilds going atm?
20:55 hlieberman How would we check? I don't think so./
20:56 ferringb look in the logs, check system load, etc.
20:56 hlieberman No active volume checks in gluster status.
20:56 hlieberman *tasks
20:56 ferringb 'cept self-heal-daemon being on
20:56 ferringb but that should be low load
20:56 hlieberman No real load or I/O
20:56 ferringb as for the ~40s bit, keep in mind that there is that whole annoying consensus bit- the vfs walk to get to that directory can be pricey
20:57 * ferringb suggests flipping on profiling and checking your logs
20:57 hlieberman OK.  We'll try that.  These errors are concerning, though.
20:57 jclift_ Hmmm, I have two gluster nodes and third box to be the "client".
20:57 ferringb errors?
20:57 hlieberman RDMA_CM_EVENT_REJECTED
20:57 ferringb ah
20:57 jclift_ It sounds like I need to get a hold of a fourth box at some point, so I can have at least 3 gluster storage nodes. :/
20:57 ferringb diagnostics.brick-log-level debug # iirc is what you want
20:58 ferringb and yes, that's likely involved in your issues. ;)
20:58 ingard oldbug 1017
20:58 glusterbot Bug http://goo.gl/ObI4f medium, high, ---, pkarampu, VERIFIED , Locking deadlock when upgrading lock
20:58 hlieberman OK.  Let me get that pushed out and see what we can do.
20:58 jclift_ It's good someone's around who knows how to look into this stuff. :)
21:01 hlieberman ferringb, That should output to the brick logs, right?
21:02 ingard oldbug 934
21:02 glusterbot Bug http://goo.gl/07AGv high, low, 3.0.6, raghavendra, CLOSED CURRENTRELEASE, md5sum mismatch when files are transferred using vsftpd
21:02 ferringb hlieberman: yep
21:03 hlieberman Oh dear god, there's a lot of output.
21:03 hlieberman OK, we'll go through it, see if there's anything in there.   Anything in particular we should be looking for?
21:04 * ferringb snickers
21:04 ferringb no clue
21:04 ferringb I've not used the rdma side yet
21:04 ferringb just commenting from the standpoint of how I've been debugging gluster; ramp up the logs to get some info, dig down into the code if the logs don't make sense, and try nailing down the causes of misc. errors. :)
21:06 ingard oldbug 995
21:06 glusterbot Bug http://goo.gl/XewBd high, low, 3.0.6, raghavendra, CLOSED CURRENTRELEASE, memory leak in io-cache
21:06 hlieberman [2013-06-21 21:04:52.065597] D [inodelk.c:303:__inode_unlock_lock] 0-gv0-locks:  Matching lock found for unlock 9223372036854775806-9223372036854775807, by 9421f291ed7f0000 on 0x247fa80
21:06 hlieberman That looks unusual.
21:07 hlieberman There's a lot of INODELK and SETXATTR and XATTROP on this one node.
21:07 hlieberman Same one with the locks.
21:07 ferringb that's not unusual
21:07 hlieberman Also, posix.c:253:posix_do_chmod function not implemented.
21:07 jclift_ hlieberman: With the connected mode rejection request, this is the only online info about it I could see: http://linux.die.net/man/3/rdma_reject
21:07 glusterbot Title: rdma_reject(3) - Linux man page (at linux.die.net)
21:07 ferringb hlieberman: distributed FS w/out a master metadata node controlling things... that means lots of generated locks per node as operations are proceeding, just to keep things stable for the request
21:08 hlieberman Makes sense.
21:08 ferringb there is a *ton* of lock traffic though, generally speaking
21:08 ferringb either way, showering; for the logs, someone may be able to help, but I honestly have just been walking the source in conjunction to diagnostic logs to make sense of some of the more screwed up gluster behaviour
21:09 hlieberman Yeah.  There doesn't even seem to be anything particularly helpful in the logs, even.
21:09 jclift_ hlieberman: I wonder if the super long dir listing times is because some first attempt to connect to one of the nodes is failing, so it's having to do some kind of fallback
21:09 hlieberman jclift_, That's my assumption, but I can't prove it.
21:10 jclift_ I have no idea how to build .deb's for Gluster
21:10 hlieberman Painfully, unfortunately.
21:11 failshell georepl from 3.2 to 3.3 doesnt seem to work :(
21:11 jclift_ With rpms, I'd find the line in the rdma code in gluster where it does connection handling (eg the rejection bit), and add a shitload of new log lines saying wtf is happening
21:11 jclift_ Then compile new rpms (trivially easy)
21:11 jclift_ Not a guaranteed win, but useful for seeing if things are really a cause of not
21:11 jclift_ Ugh.  I'm getting distracted.
21:12 * jclift_ gets back to work
21:12 hlieberman Hee.
21:12 hlieberman Yeah, we're almost there ourselves.
21:14 dbruhn5 joined #gluster
21:17 dbruhn5 ks
21:18 hlieberman Is there a way to change the transport type without destroying the entire gluster volumes?
21:18 failshell anyone doing georepl from 3.2 to 3.3?
21:18 dbruhn5 left #gluster
21:19 ingard oldbug 1599
21:19 glusterbot Bug http://goo.gl/Nt6jl low, low, ---, amarts, CLOSED CURRENTRELEASE, can we build glusterfs with -O0 -g by default
21:23 ingard JoeJulian: https://bugzilla.redhat.com/show_bug.cgi?id=763331
21:23 ingard hehe
21:23 glusterbot <http://goo.gl/Nt6jl> (at bugzilla.redhat.com)
21:23 glusterbot Bug 763331: low, low, ---, amarts, CLOSED CURRENTRELEASE, can we build glusterfs with -O0 -g by default
21:23 ingard quite funny that last comment :>
21:32 joeljojojr joined #gluster
21:36 joeljojojr Hi. I currently have a distributed-replicated volume (2 x 2 =4) where I lost a mirrored brick. Due to some unwise and "I don't want to talk about it" rsync-ing, I am left wit a lot of files that I want to keep that are missing grid entries in the .glusterfs directory. As a result, it's not "healing" them. Worse yet, every time something tries to do a directory listing where some of these files exist, it starts running a self-heal process. Since it can't fix
21:36 joeljojojr those files, it tries to run self-heal again the next time the directory is accessed, which is bringing my server to it's knees. Can anyone tell me how to get gluster to just accept the files that have a gfid absence, and heal them?
21:45 hlieberman jclift_, We remounted it without rdma - straight TCP - and it's hanging at a directory.  Just straight up hung.
21:45 hlieberman Nothing in any of the logs, not on the servers or the client.
21:49 dbruhn what version?
21:49 hlieberman Latest.  3.4.0-beta3
21:52 hlieberman There isn't anything useful in the logs as far as I can see, anymore.  The RDMA errors went away when we changed the transport to tcp only, and the cannot connect error isn't happening because NFS is running.
21:57 jclift_ hlieberman: That is weird.
21:57 hlieberman Extremely.
21:58 hlieberman Any ideas?  We're at a loss about even where to look.
21:58 jclift_ hlieberman: Maybe create a BZ about it, with all the info you can find, and hope the dev's have some good ideas about it on Monday? :)
22:00 hlieberman Could it possibly be an interaction with ZFS?
22:01 jclift_ Interesting.
22:01 jclift_ I really don't know.
22:01 jclift_ I don't have enough practical experience with debugging issues personally, to have any good clue with this yet. :(
22:09 abyss^ joined #gluster
22:12 tjstansell left #gluster
22:16 hlieberman Is 3.3.1 recommend over 3.3.2?
22:19 Deformative joined #gluster
22:30 fidevo joined #gluster
22:33 DataBeaver joined #gluster
22:41 duerF joined #gluster
22:43 joelwallis joined #gluster
22:52 Nagilum_ 3.3.2 hasn't been released yet
22:52 neofob left #gluster
23:11 rcoup joined #gluster
23:21 dowillia left #gluster
23:26 manik joined #gluster
23:36 Shahar joined #gluster
23:40 forest joined #gluster
23:51 joelwallis joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary