Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-03-01

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:23 chirino_m joined #gluster
00:30 wrale The admin guide has me confused on the topic of distributed replication.. It says: ' To make sure that replica-set members are not placed on the same node, list the first brick on every server, then the second brick on every server in the same order, etc. '  ... In my case, I have six nodes of two bricks per... Does this mean I cannot go any higher than replica 2 using distributed replication , while keeping to an 'any two nodes can fallout in a h
00:30 wrale ealthy cluster' policy?  Am I missing something?
00:31 wrale (It also says: Each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a volume-wide distribute set.)
00:31 kris joined #gluster
00:32 theron joined #gluster
00:33 nueces joined #gluster
00:35 hagarth joined #gluster
00:37 wrale Or is it that I would have four 'replica sets' (s1:b1, s2:b1, s3:b1),(s4:b1, s5:b1, s6:b1),(s1:b2, s2:b2, s3:b2),(s4:b1, s5:b1, s6:b2)... and my 3 replicas would go into three of these four slots, randomly or based on load?
00:37 JoeJulian @brick order
00:37 glusterbot JoeJulian: Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4.
00:37 JoeJulian And for distribute, see
00:38 JoeJulian @lucky dht misses are expensive
00:38 glusterbot JoeJulian: http://joejulian.name/blog​/dht-misses-are-expensive/
00:38 wrale Thanks.. really trying to understand and stop nagging
00:38 JoeJulian Also @lucky wikipedia dht distributed hash
00:39 JoeJulian Why did I do that? I know that's not a valid syntax...
00:39 JoeJulian @lucky wikipedia dht distributed hash
00:39 glusterbot JoeJulian: http://en.wikipedia.org/wi​ki/Distributed_hash_table
00:39 JoeJulian wrale: No worries. Just imaging figuring all this out without anyone around to answer questions. That's me 4 years ago.
00:40 wrale JoeJulian: no envy on that... This stuff is a constant brain teaser
00:40 JoeJulian So I suppose this is going to be rather cheeky of me now: https://docs.google.com/spreadsheet/ccc?key=0Ao​p9a4V-9wr-dDhTQ294aFJMMGR0Z0hLNm9zZVdWZ0E#gid=0
00:40 glusterbot Title: Google Drive (at docs.google.com)
00:41 wrale Yeah.. I figured it out earlier on my own.. felt pretty stupid.. just divide the total by the number of replicas, i suppose.. yeah
00:42 JoeJulian I aim to please...
00:42 wrale So for my 2*3TB on six nodes .. that's 36TB divided by three replicas is 12GB
00:43 wrale That number 12 bums me out, honestly.. Too few disk slots in these servers.. Lots of regrets
00:43 wrale If i do replica two, i can only lose one server and no more
00:43 wrale (it seems)
00:45 cjanbanan joined #gluster
00:46 wrale I honestly think the issue for me is the tiny scale.. If I were thinking about a larger cluster, where there would be multiple volumes, I think it would seem more simple.. everything is so wrapped up in these six nodes, that i get profusely confused
00:46 wrale (one big volume is going into this for ovirt)
00:48 wrale (Reason being -- "distributed" would have a more spacial meaning than it seems to at this scale)
00:48 JoeJulian right
00:50 wrale Cool.. Didn't know YaCy used a dht
00:51 kris joined #gluster
00:56 mattappe_ joined #gluster
00:59 vravn joined #gluster
01:09 mattappe_ joined #gluster
01:09 wrale So then.... I'd have four replica sets.. Each with three bricks, spanning three hosts per (three copies of file X)... Replica sets 1 and 3 share a set of three hosts ... Sets 2 and 4 do as well.. Placing files into this distributed replicated volume: file 1 goes to set 1's three nodes.. file 2 goes to set 2...and so on.. until file 5 goes into set 1, etc.. (i guess)
01:13 wrale s/spanning three hosts per/spanning three hosts per replica set/
01:13 glusterbot What wrale meant to say was: So then.... I'd have four replica sets.. Each with three bricks, spanning three hosts per replica set (three copies of file X)... Replica sets 1 and 3 share a set of three hosts ... Sets 2 and 4 do as well.. Placing files into this distributed replicated volume: file 1 goes to set 1's three nodes.. file 2 goes to set 2...and so on.. until file 5 goes into set
01:13 glusterbot 1, etc.. (i guess)
01:15 chirino joined #gluster
01:16 wrale Drawing on this from the admin guide: The number of bricks should be a multiple of the replica count for a distributed replicated volume. Also, the order in which bricks are specified has a great effect on data protection. Each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a volume-wide distribute set. To make sure that replica-set members are not placed on the same node, list th
01:16 wrale e first brick on every server, then the second brick on every server in the same order, and so on.
01:19 * wrale visualizes the intro the Kung Fu series https://www.youtube.com/watch?v=ZY6HgFyoupw
01:19 glusterbot Title: Kung Fu TV Show Intro - YouTube (at www.youtube.com)
01:19 wrale lol.. standing in the rain.. must know glusterfs
01:25 wrale interesting: http://pl.atyp.us/tag/glusterfs.html
01:25 glusterbot Title: Platypus Reloaded - glusterfs (at pl.atyp.us)
01:26 edong23 joined #gluster
01:30 mattappe_ joined #gluster
01:34 Guest19520 joined #gluster
01:37 wrale So Jeff Darcy used to work at SciCortex.. Awesome.. Their's was the first HPC on which I was admin.. SC5832
01:37 wrale s/SciCortex/SiCortex/
01:37 glusterbot What wrale meant to say was: So Jeff Darcy used to work at SiCortex.. Awesome.. Their's was the first HPC on which I was admin.. SC5832
01:38 wrale Spelling!
01:39 wrale Wow.. one of ~75 such machines.. good to know
01:39 * wrale shuts up
01:40 sroy joined #gluster
01:46 mattapperson joined #gluster
01:48 vpshastry joined #gluster
01:50 vravn joined #gluster
01:57 bala joined #gluster
02:08 askb joined #gluster
02:18 glusterbot New news from newglusterbugs: [Bug 1071504] rpmbuild/BUILD directory needs to be created for CentOS 5.x <https://bugzilla.redhat.co​m/show_bug.cgi?id=1071504>
02:30 kris1 joined #gluster
02:35 ira joined #gluster
02:40 kris2 joined #gluster
02:46 chirino_m joined #gluster
03:01 nightwalk joined #gluster
03:02 jporterfield joined #gluster
03:03 jmarley joined #gluster
03:04 mattappe_ joined #gluster
03:08 RicardoSSP joined #gluster
03:08 RicardoSSP joined #gluster
03:14 psyl0n joined #gluster
03:19 bala joined #gluster
03:21 jporterfield joined #gluster
03:24 kris joined #gluster
03:24 wrale Should 'performance.io-thread-count' match the number of CPU cores, or is there little correlation, from a tuning perspective?
03:25 wrale Also, is there any tuning I should do for jumbo frames, inside gluster or in setting up a volume?
03:26 mattappe_ joined #gluster
03:38 jporterfield joined #gluster
03:53 elyograg wrale: I think I'd use 2 or 3 less than the total number of cores.  leave some available for the system to handle other things that gluster might need to do.  as for jumbo frames, my guess (which could be totally wrong) is that if you've got the OS and the switch configured right, the software should just work.  I've spent a few minutes looking for a gluster setting for jumbo frames and there doesn't seem to be one.
03:55 wrale elyograg:  thanks for the help.. i was considering tweaking the cache values, but i think i might benchmark without that to see where my baseline is
03:56 elyograg i should probably use jumbo frames.  i assume you need it on all gluster server nics, the switches, and all clients?  I am wondering what happens with clients talking to other servers that aren't doing jumbo frames.
03:58 wrale elyograg: i was wondering that, too.. doing so would mean i would have needed split-horizon dns, because of my multihomed boxes.. storage at mtu9000 clients at 9000 or sometimes 1500... I gave that up and went for 9000 private networking inside the cluster.. I imagine I'll use something like owncloud to support my users' generic storage needs, to bridge the gap
03:59 wrale maybe even samba4, backed by gluster..
03:59 wrale (via vm)
03:59 elyograg i've wanted to avoid having a separate network for client access, because i don't want to put an extra nic in everywhere.  if i ever get real funding, i'd want infiniband, which would be a separate network.
04:00 wrale i regret not going infiniband.. too many knobs to turn when ordering $500k in hardware under pressure
04:00 vravn joined #gluster
04:01 elyograg wow, i wish i could get a budget like that.  i'd have probably bought isilon, though. :)
04:03 wrale luckily i have enough trays to more than double the capacity of this six node cluster.. yeah.. i've heard good things about isilon... (the 500k bought 72 nodes in total, but the remaining 66 will run user jobs and not gluster)
04:04 wrale i'm going all the way though.. btrfs everywhere plus lzo compression.. :)
04:04 wrale beta everything
04:05 wrale i'm writing a guide on what i'm doing.. will try to post here if it's ever worthwhile
04:06 elyograg we use CentOS for stability and because RHEL/CentOS is the only way to really get Dell's server admin software to work right.  The 2.6 kernel is quite old, so I don't dare run btrfs on it.
04:08 wrale Yeah.. I started with c6.5, but learned better about their btrfs support.. was hoping to also use SR-IOV... i tried the 3.13 kernel-ml package.. (to try to get to some btrfs stability)... that worked, but iproute2 was no longer synced with the kernel and so SR-IOV had issues vs. iproute2.. i called it a day and moved on to f19 (per ovirt limits) (no f20 support yet)
04:08 wrale (and since have dropped sr-iov).. can't look back though.. timeline
04:09 elyograg I'd love to have snapshots, even if they're only brick level.
04:09 wrale i'm excited about that.. and copy on write
04:09 wrale using 3.5b3.. which i guess has better support for btrfs.. speed reading isn't always best..lol
04:11 vpshastry joined #gluster
04:11 vpshastry left #gluster
04:12 vravn joined #gluster
04:14 gmcwhistler joined #gluster
04:17 gmcwhist_ joined #gluster
04:18 kris joined #gluster
04:21 dorko joined #gluster
04:21 DV joined #gluster
04:46 wrale Any issue with volumes containing hyphens in the middle of their name?
04:46 wrale (just being cautious)
04:49 kris joined #gluster
04:57 vravn joined #gluster
05:08 codex joined #gluster
05:14 samppah wrale: not that i know of.. i'm using them and haven't run into any issues
05:19 kris joined #gluster
05:22 dusmant joined #gluster
05:26 wrale samppah: thanks
05:34 pvh_sa joined #gluster
05:41 wrale Useful info: https://lists.gnu.org/archive/html/​gluster-devel/2008-05/msg00362.html  .. Now, if only I knew if this is per brick, per volume or per server.. ugh
05:41 glusterbot Title: Re: [Gluster-devel] Setup recommendations - 2 (io-threads) (at lists.gnu.org)
05:45 wrale seems like an important thing to know.. maybe i'll open a documentation bug
05:56 kris joined #gluster
06:09 wrale Is this normal under load (iozone) on six node, two brick per node, replica 3, distributed-replicated over 10GbE? glusterfs: [2014-03-01 06:02:34.044146] C [client-handshake.c:127:rpc​_client_ping_timer_expired] 0-benchmark-vol-n1-client-4: server 10.30.3.5:49152 has not responded in the last 42 seconds, disconnecting.
06:17 cjanbanan joined #gluster
06:19 jporterfield joined #gluster
06:31 jporterfield joined #gluster
06:31 marcoceppi joined #gluster
06:31 marcoceppi joined #gluster
06:32 haomaiwa_ joined #gluster
06:39 kris joined #gluster
06:42 haomaiwa_ joined #gluster
06:46 haomaiwa_ joined #gluster
06:48 wrale Wow iozone and gluster don't get alone very well.. 2kb file test is of epic length for 4GB
06:48 wrale *along
06:51 bala joined #gluster
06:55 jporterfield joined #gluster
07:01 haomai___ joined #gluster
07:18 hagarth joined #gluster
07:21 chirino joined #gluster
07:56 ekuric joined #gluster
08:00 ctria joined #gluster
08:06 askb joined #gluster
08:07 fidevo joined #gluster
08:10 bala joined #gluster
08:12 jporterfield joined #gluster
08:31 jporterfield joined #gluster
08:35 ProT-0-TypE joined #gluster
08:41 jporterfield joined #gluster
08:54 jporterfield joined #gluster
09:10 bala joined #gluster
09:12 edong23 joined #gluster
09:34 jporterfield joined #gluster
09:57 jporterfield joined #gluster
09:59 vpshastry joined #gluster
10:03 pvh_sa joined #gluster
10:09 jporterfield joined #gluster
10:59 calum_ joined #gluster
11:05 ira joined #gluster
11:53 psyl0n joined #gluster
12:08 pvh_sa wrale, interesting tests. have you tried with bonnie++ ? i'll do my own testing next week
12:23 jporterfield joined #gluster
12:27 uebera|| joined #gluster
12:28 mattapperson joined #gluster
12:31 nixpanic joined #gluster
12:31 nixpanic joined #gluster
12:42 primechuck joined #gluster
13:06 jporterfield joined #gluster
13:07 reuss joined #gluster
13:08 reuss so -- i have absolutely zero experience with gluster but am currently experiencing massive load on a bunch of nodes -- and from what i can dig out from log files, it's currently doing a lot of self-healing - i have no idea why or what happened -- what would be my first course of action to determine what is going on?
13:09 KORG joined #gluster
13:15 diegows joined #gluster
13:20 mattapperson joined #gluster
13:27 chirino_m joined #gluster
14:00 chirino joined #gluster
14:08 mattapperson joined #gluster
14:10 jporterfield joined #gluster
14:20 jporterfield joined #gluster
14:20 sprachgenerator joined #gluster
14:28 sroy_ joined #gluster
14:47 P0w3r3d joined #gluster
15:03 psyl0n joined #gluster
15:19 cjanbanan joined #gluster
15:23 vpshastry joined #gluster
16:00 NeatBasis joined #gluster
16:01 mattapperson joined #gluster
16:22 vpshastry joined #gluster
16:28 mojorison joined #gluster
16:34 davinder2 joined #gluster
16:40 haomaiwa_ joined #gluster
16:42 jporterfield joined #gluster
17:00 cjanbanan joined #gluster
17:11 jporterfield joined #gluster
17:27 RameshN joined #gluster
17:32 rahulcs joined #gluster
17:37 hagarth joined #gluster
17:39 rahulcs joined #gluster
18:12 rahulcs joined #gluster
18:31 psyl0n joined #gluster
18:32 rotbeard joined #gluster
18:32 elyograg how likely is it that centos 6.4 has NFS client or rsync bugs that cause problems with gluster?  I was having severe problems loading data onto gluster 3.3.1 via NFS.  The brick servers would get pathologically high load and the rsync processes would lock up.  Now that I've upgraded to 3.4.2, the client still has rsync lock up, but now gluster only sees a very slight elevation in load and seems to take the whole thing in stride.
18:35 Pavid7 joined #gluster
18:38 cp0k_ joined #gluster
18:52 rahulcs joined #gluster
18:53 ProT-0-TypE joined #gluster
18:56 chirino joined #gluster
18:57 kris joined #gluster
19:01 purpleidea elyograg: i can't speak for NFS, but i'm fairly confident rsync itself doesn't have any bugs relating to this.
19:02 elyograg That would be my guess as well.
19:02 purpleidea maybe check your network for errors? bad cables? dropped packets? look in the gluster logs... etc
19:03 samppah @rsync
19:03 glusterbot samppah: normally rsync creates a temp file, .fubar.A4af, and updates that file. Once complete, it moves it to fubar. Using the hash algorithm, the tempfile hashes out to one DHT subvolume, while the final file hashes to another. gluster doesn't move the data if a file is renamed, but since the hash now points to the wrong brick, it creates a link pointer to the brick that actually has the
19:03 glusterbot samppah: data. to avoid this use the rsync --inplace option
19:04 purpleidea ^^ good point!
19:04 elyograg I now have monitoring alerts whenever an error shows up in the standard gluster logs.  None has showed up recently.
19:07 elyograg I may have spoken too soon.  On the server where NFS access is done (which has no bricks) I do see this: http://fpaste.org/81560/13937007/  I am UTC-7, so for me this would have been 10:39 last night.  The rsync process that locked up was making directories in the location shown.  It wasn't copying files at the time.
19:07 glusterbot Title: #81560 Fedora Project Pastebin (at fpaste.org)
19:08 elyograg there are no entries in either heal info or heal info split-brain.
19:10 elyograg I've got another rsync going on the same machine that *is* copying files, still working.  On 3.3.1 when this problem showed up, any other access to the volume (additional rsyncs, du, etc) would lock up entirely and the server load on the brick servers would go pathological -- we got over 40 once.  With 3.4.2, the server side seems completely fine.
19:10 cp0k_ joined #gluster
19:11 elyograg but the client problems continue.
19:13 elyograg I do need to use --inplace.  I think I'll need to rebalance this volume now.  Slow, but I know that 3.4.2 will handle it OK.  3.3.1 had severe bugs that caused data loss for us when we tried to add storage and rebalance.
19:14 chirino joined #gluster
19:16 eryc joined #gluster
19:26 purpleidea elyograg: you're not out of free space on any of the bricks, right?
19:26 elyograg nope.  I think it's about 15 percent.
19:27 elyograg the volume where we did the rebalance wasn't either.  I think it was about two thirds full, then we added the same amount of space again.  On that, the first set of bricks is about 90% full and the second about 50%, whole volume at 68%.
19:27 elyograg Still need to do the rebalance, and now that we've upgraded gluster, we can.
19:29 purpleidea sweet! well hope it works out :)
19:29 ninkotech joined #gluster
19:29 ninkotech_ joined #gluster
19:43 mattapp__ joined #gluster
20:02 nightwalk joined #gluster
20:08 calum_ joined #gluster
20:39 jporterfield joined #gluster
20:41 ProT-0-TypE joined #gluster
20:57 ProT-0-TypE joined #gluster
21:20 rahulcs joined #gluster
21:24 doekia to prevent php latency I have made a distributed partition image. when I change a file it does not get replicated ... even sync and stat() ... only remount reflect the changes. any hint?
21:34 Guest19520 joined #gluster
21:36 doekia ~php | doekia
21:36 glusterbot doekia: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH
21:36 glusterbot --negative-timeout=HIGH --fopen-keep-cache
21:36 doekia ~
21:36 doekia ~mount | doekia
21:36 glusterbot doekia: I do not know about 'mount', but I do know about these similar topics: 'If the mount server goes down will the cluster still be accessible?', 'mount server'
21:36 doekia mount server
21:37 doekia ~mount server | doekia
21:37 glusterbot doekia: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns
21:37 doekia ~stat
21:37 doekia ~stat | doekia
21:37 glusterbot doekia: I do not know about 'stat', but I do know about these similar topics: 'io-stats', 'pastestatus', 'pastevolumestatus', 'stat-prefetch', 'volstat'
21:38 doekia ~io-stats | doekia
21:38 glusterbot doekia: Command: setfattr -n io-stats-dump /mnt/glusterfs will dump IO statistics in log files. /mnt/glusterfs is GlusterFS client mountpoint.
21:38 doekia ~volstat | doekia
21:38 glusterbot doekia: Please paste the output from gluster volume status to fpaste.org or dpaste.org and post the url that's generated here.
21:39 doekia ~brick | doekia
21:39 glusterbot doekia: I do not know about 'brick', but I do know about these similar topics: 'brick naming', 'brick order', 'brick port', 'clean up replace-brick', 'former brick', 'reuse brick', 'which brick'
21:39 doekia ~nfs | doekia
21:39 glusterbot doekia: To mount via nfs, most distros require the options, tcp,vers=3 -- Also an rpc port mapper (like rpcbind in EL distributions) should be running on the server, and the kernel nfs server (nfsd) should be disabled
21:40 doekia ~rsync
21:40 doekia ~rsync | doekia
21:40 glusterbot doekia: normally rsync creates a temp file, .fubar.A4af, and updates that file. Once complete, it moves it to fubar. Using the hash algorithm, the tempfile hashes out to one DHT subvolume, while the final file hashes to another. gluster doesn't move the data if a file is renamed, but since the hash now points to the wrong brick, it creates a link pointer to the brick that actually has
21:40 glusterbot the data. to avoid this use the rsync --inplace option
21:41 doekia ~stat | doekia
21:41 glusterbot doekia: I do not know about 'stat', but I do know about these similar topics: 'io-stats', 'pastestatus', 'pastevolumestatus', 'stat-prefetch', 'volstat'
21:42 doekia ~fopen-keep-cache | doekia
21:42 glusterbot doekia: Error: No factoid matches that key.
21:42 doekia ~factoid | doekia
21:42 glusterbot doekia: Error: No factoid matches that key.
21:42 doekia ~help | doekia
21:42 glusterbot doekia: I do not know about 'help', but I do know about these similar topics: 'hack', 'hi'
21:43 doekia ~list | doekia
21:43 glusterbot doekia: I do not know about 'list', but I do know about these similar topics: 'backport wishlist', 'mailing list'
21:52 cjanbanan joined #gluster
21:57 neoice joined #gluster
22:10 rahulcs joined #gluster
22:16 pvh_sa joined #gluster
22:19 ProT-0-TypE joined #gluster
22:32 doekia ~php | doekia
22:32 glusterbot doekia: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH
22:32 glusterbot --negative-timeout=HIGH --fopen-keep-cache
22:33 doekia ~php | doekia
22:33 glusterbot doekia: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH
22:33 glusterbot --negative-timeout=HIGH --fopen-keep-cache
22:33 doekia ~fuse | doekia
22:33 glusterbot doekia: I do not know about 'fuse', but I do know about these similar topics: 'forge'
22:33 doekia ~fuse~ | doekia
22:33 glusterbot doekia: Error: No factoid matches that key.
22:34 doekia ~php | doekia
22:34 glusterbot doekia: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH
22:34 glusterbot --negative-timeout=HIGH --fopen-keep-cache
23:26 cjanbanan joined #gluster
23:32 rahulcs joined #gluster
23:54 rahulcs joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary