Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-06-15

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:26 azathoth99 joined #gluster
00:27 azathoth99 so is gluster file level or block level storage?
00:30 TheDingy joined #gluster
00:37 TheDingy I am getting ready to test gluster out for a distributed file system, what kind of limits do you guys see on single server that will be cluster across two locations with 40gb/sec of connectivity and 10gbit/sec bonded connections using rhel
00:38 TheDingy Right now we use ZFS primarly with 40-45 4tb drives lots of caching etc
00:48 how-art-thou 10Gb /s network nice
00:48 how-art-thou so 4 of them makes 40?
00:48 how-art-thou bonded?
00:48 how-art-thou wow
00:48 how-art-thou should be nice n speedy
00:48 how-art-thou :)
00:49 TheDingy Yeah, and we can go up from there
00:49 TheDingy we are currently running freebsd and see 20gb/sec reguarlly
00:54 TheDingy We can do 100gb/sec if we need to now with the new nexus 7700
00:55 TheDingy but dwdm with 40gb/sec or 10gb/sec is more reliable and cheaper for the interdc stuff
00:55 Ark joined #gluster
02:37 daMaestro joined #gluster
03:38 haomaiwa_ joined #gluster
03:44 coredump joined #gluster
03:52 davinder12 joined #gluster
03:54 XpineX joined #gluster
04:51 shubhendu_ joined #gluster
05:13 ababu joined #gluster
05:55 bala joined #gluster
05:57 ababu joined #gluster
06:37 bala joined #gluster
06:47 bala joined #gluster
06:55 ekuric joined #gluster
07:18 sputnik13 joined #gluster
07:32 ramteid joined #gluster
08:13 fsimonce joined #gluster
08:19 ProT-0-TypE joined #gluster
08:46 DV joined #gluster
09:54 davinder12 joined #gluster
10:26 LebedevRI joined #gluster
10:45 ababu joined #gluster
10:45 dudi joined #gluster
11:27 bala joined #gluster
11:34 haomaiwa_ joined #gluster
11:35 ctria joined #gluster
12:35 davidhadas joined #gluster
12:36 ThatGraemeGuy_ joined #gluster
12:40 haomaiwa_ joined #gluster
12:47 shyam joined #gluster
13:04 kdhananjay joined #gluster
13:27 koguma joined #gluster
13:30 koguma Is anyone running 3.4.3 on CentOS 5.10?
13:32 ernetas joined #gluster
13:32 ernetas He yguys.
13:32 ernetas How come when one of my nodes go down and comes back up, root of the volume becomes chowned to www-data?
13:33 ernetas *sorry, to root. While it was chowned to www-data before reboot. Both on the mounted directory and unmounted.
13:43 ernetas Anyone?
13:51 koguma Unfortunately, there's a lot of people on here, but I've yet to see anyone respond...
13:51 koguma Is anyone running 3.4.3 on CentOS 5.10?
14:02 ernetas koguma: not so many, just 200...
14:02 ernetas Look at #puppet channel, there are 1000 people and >90% of them are ONLY asking questions.
14:03 koguma Well, it's puppet. :P
14:04 koguma Maybe I should ask my gluster questions on the chef channel...
14:08 koguma btw, I note glusterd runs as root, maybe it's chowning the directory to the user it's running as?
14:09 koguma I do
14:09 koguma # gluster volume create devroot replica 2 transport tcp sfdev1:/data/brick1/devroot sfdev2:/data/brick1/devroot
14:09 koguma volume create: devroot: failed
14:10 koguma and I get that failed error...
14:10 koguma The only thing that gets created are the extended attributes in /data/brick1/devroot on sfdev1
14:10 koguma And the logs just have: [cli-rpc-ops.c:805:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
14:10 koguma [input.c:36:cli_batch] 0-: Exiting with: -1
14:11 koguma and the cmd log:  : volume create devroot replica 2 transport tcp sfdev1:/data/brick1/devroot sfdev2:/data/brick1/devroot : FAILED :
14:11 koguma When I run in debug, I get a tiny bit more info.
14:12 koguma This is the weird error msg:  [glusterd-utils.c:620:glus​terd_check_volume_exists] 0-management: Volume devroot does not exist.stat failed with errno : 2 on path: /var/lib/glusterd/vols/devroot
14:12 koguma That /var/lib path is specified in the mgmt vol section
14:14 koguma I've used gluster 3.2 before with none of these kinds of issues.. *sigh*
14:15 koguma Can anyone shed some light on this?
14:18 ninkotech joined #gluster
14:18 ninkotech_ joined #gluster
14:19 koguma no takers?
14:26 sputnik13 joined #gluster
14:30 koguma I guess I'm left with bugzilla...
14:34 coredump joined #gluster
14:34 DV joined #gluster
14:36 stickyboy Hmm, my gluster is slightly $(#*'d due to a RAID5 derp.
14:36 nbalachandran joined #gluster
14:38 koguma @ernetas looks like someone filed a bug similar to yours...
14:38 koguma https://bugzilla.redhat.co​m/show_bug.cgi?id=1016482
14:38 glusterbot Bug 1016482: high, unspecified, ---, gluster-bugs, NEW , Owner of some directories become root
14:42 ernetas koguma: it is similar, but not the same. I have no problem with creating new directories with non-root permissions.
14:44 koguma Gluster 3.2 is starting to look tempting...
14:45 tryggvil joined #gluster
14:58 DV joined #gluster
14:59 stickyboy Ok, raise your hand if you use JBOD on your bricks...
15:02 TheDingy stickyboy: I thought you really weren't supposed to do that
15:03 ramteid TheDingy: So I do something illegal, awesome :)
15:03 TheDingy We currently use the back blaze with 45 drives on ZFS, but with this new system I am having to go to something else and looking at doing 9 of the 45 drive systems through three sites
15:03 TheDingy How well does gluster run with say 180tb per node
15:05 ramteid TheDingy: maybe this helps: http://jread.us/2013/06/one-petabyte-red-h​at-storage-and-glusterfs-project-overview/ ?
15:06 TheDingy ramteid: thanks, hadn't seen that yes, need to read it
15:06 TheDingy Just looking quickly I do have HA requirement
15:07 TheDingy and Geo-replication
15:07 ramteid TheDingy: yw, if I recall someone uses it also with massive data (video storage)
15:08 nbalachandran joined #gluster
15:10 ababu joined #gluster
15:12 koguma Anyone have volume creation problems on 3.4.3 w/ CentOS 5.x?
15:14 stickyboy TheDingy: I think the Red Hat storage guide recommends RAID6 actually..
15:15 TheDingy stickyboy: I saw that
15:16 stickyboy TheDingy: Also, I'm using SATA, and I'll probably never do that again.
15:16 TheDingy Why won't you use sata? Not enough performance?
15:16 TheDingy We are geting great performance at 180tb
15:17 TheDingy with ZFS on a single node, but we need to setup so we can loose up to one site which there will be 1/3rd of our servers at each of 3 sites
15:18 tziOm joined #gluster
15:18 stickyboy TheDingy: I'm increasingly skeptical about SATA... first of all we're using Seagate, second I realize (now) that SAS is more reliable.
15:18 TheDingy AND the customer has asked for expandability solution's to five sites from three AND 3pb of storage all HA
15:19 TheDingy stickyboy: IMHO SAS is only more reliable due to the fact of storage density
15:19 koguma There's a world of performance difference between sata and sas.
15:19 koguma Just like there's a world of performance difference between sas and ssd.
15:20 TheDingy koguma: This is true
15:20 stickyboy TheDingy: Yeah, my next move will be WD drives (SATA) and RAID6.
15:21 TheDingy But if you use enterprise drives we can get 4tb of density SATA 3 is DAMN close on mechanical drives and when you are doing 450 drives
15:21 TheDingy Price becomes an issue
15:22 stickyboy TheDingy: Backblaze had some nice numbers on hard drive reliability a few months back, did you see it?  Pretty damning for Seagate.
15:22 haomaiwang joined #gluster
15:23 koguma Since people are waking up... anyone running 3.4.3 on CentOS 5.x?  I'm unable to create volumes on a fresh install.
15:24 TheDingy stickyboy: Yes, I did see it
15:24 stickyboy koguma: Nah, moved off CentOS 5.3 a few months ago.  Luckily.
15:25 ernetas Ghmm... damn it! How on earth do I setup quorum for GlusterFS?
15:25 TheDingy They are saying the 4tb Hitachi's are looking good
15:25 ernetas I keep running into split heads even with quorum set to >51%.
15:26 stickyboy TheDingy: Yah, Hitachis are hard to get, though, eh?
15:26 koguma stickyboy: I've got some legacy stuff I need running on CentOS 5.x.. *sigh*
15:26 TheDingy They can be but you can find them
15:26 stickyboy TheDingy: Hitachi went to WD, right?
15:28 TheDingy Yes, but you can still get the 4tb in true Hitachi and I can't tell you how many we had deployed but they worked well
15:29 stickyboy TheDingy: Nice.
15:30 stickyboy I think I'll go with WD if I can't find Hitachi.  Gotta be better than Seagate.
15:30 stickyboy And it's hard to get spares in Africa (Kenya).
15:33 koguma stickyboy:  I hear the Samsung drives Seagate acquired are still ok, I think they're still made by the Korean plant..
15:33 TheDingy The seagate's aren't that good at all
15:34 koguma stickyboy:  imho I do like Hitachi drives, too bad Samsung sold their drives to Seagate, those were great too.
15:34 TheDingy The better seagates are good drives
15:35 TheDingy stickyboy: Most of the backblaze data is from tehir cheaper drives I have a couple of hundred barracudda's and have had 0 failures after burn in on Seagate
15:36 koguma I think the Seagate/Samsung Spinpoints might still be decent.
15:37 haomaiwang joined #gluster
15:39 sputnik13 joined #gluster
15:40 stickyboy So I think my next augmentations to this infrastructure will be WD drives + RAID6.
15:42 haomaiwa_ joined #gluster
15:43 ababu joined #gluster
16:03 Ark joined #gluster
16:03 ababu joined #gluster
16:12 [o__o] joined #gluster
16:34 davinder12 joined #gluster
16:38 davinder12 joined #gluster
16:51 ababu joined #gluster
17:02 rjoseph joined #gluster
17:11 stickyboy TheDingy: So I have a failed RAID5 underneath one of my gluster replicas.
17:11 stickyboy I'm not sure how to proceed now...
17:13 nbalachandran joined #gluster
17:16 primechuck joined #gluster
17:32 FooBar just replace the disk and rebuild the raid ?
17:32 FooBar gluster wouldn't care, performance is just down a bit
17:39 stickyboy FooBar: Nah, RAID5 lost two disks... array is gone. :)
17:40 stickyboy Gotta construct a new RAID5 and then mount it to the same place... then I guess replace brick?
17:51 primechuck joined #gluster
17:54 FooBar stickyboy: ah ok ... that sucks
17:55 FooBar yup... but not required to be in the same space
17:55 FooBar you can replace brick with a new path
17:55 FooBar if replacing with same path... you need a --force iirc
18:04 jbd1 joined #gluster
18:05 stickyboy FooBar: Cool.
18:05 stickyboy btw I took all my glusters down during this period... when it comes up it will look for its replica and fail.
18:05 stickyboy Do I have to do anything before replacing brick?
18:13 FooBar nope... as long as you have replica's
18:17 stickyboy FooBar: I guess I need to see it to believe it.  Lemme get the RAID back up first.  Gotta go to the server room and replace drives first.
18:28 haomaiwa_ joined #gluster
18:32 FooBar stickyboy: and put better monitoring on your raids :) ... so you see a disk failure before disk 2 goes ;)
18:41 daMaestro joined #gluster
19:03 stickyboy FooBar: I blame solar flares.
19:04 FooBar :)
19:04 FooBar I don't use any raid under my gluster volumes
19:04 FooBar just triple-replica's
19:04 stickyboy Ok, just replaced the drives and re-created the RAID.
19:04 stickyboy FooBar: So you just pass the disks through as JBOD?
19:08 FooBar stickyboy: yup... saves me a lot of I/O's
19:09 FooBar 1 write becomes 3 writes... not 2x(raid-size) writes
19:09 stickyboy FooBar: True.
19:11 FooBar and it's a bit more overhead in size... might go back to 2 replica's + 1 offsite later, now it's 3 replica's + 1 offsite backup
19:11 stickyboy FooBar: So you mkfs each drive independently and mount like 12 bricks or whatever?
19:12 FooBar stickyboy: http://paste.sigio.nl/p9qdtguuv
19:12 glusterbot Title: Sticky Notes (at paste.sigio.nl)
19:25 stickyboy FooBar: Nice.
19:25 stickyboy I think I'll stick to RAID, ala the RedHat storage recommendations.
19:28 FooBar yup... this is a 'weird' setup
19:29 stickyboy Now's probably a good time to update the firmware on my RAID controllers.
19:35 FooBar :P
19:36 TheDingy stickyboy: FooBar has it all right, any large system, you don't use RAID controllers, you just use them as block device
19:38 qdk joined #gluster
19:38 stickyboy TheDingy: Yeah, I like the idea of that.  Also, RAID controllers are iffy.
19:39 stickyboy I read an interesting blog post about ZFS which has some general insight about storage design
19:40 TheDingy I really like ZFS but for single machines only been using it for over 10 years
19:40 stickyboy http://nex7.blogspot.com/2013/03/readme1st.html
19:40 glusterbot Title: Nex7's Blog: ZFS: Read Me 1st (at nex7.blogspot.com)
19:43 stickyboy Especially the stuff about RAID cards vs HBA's, SATA vs SAS
19:46 TheDingy The SATA vs SAS he is off on some of it
19:47 TheDingy There are good interposers out there you just have to evaluate your disks
19:48 ernetas Hey guys.
19:49 ernetas A little bit more complicated question... How do I automatically remount Gluster mount when Gluster server comes back online?
19:50 ernetas Say, I have a quorum of 51% and 3 nodes. 2 nodes go down. Last one shuts down itself. But when it comes back up, it does not automatically remount the lost mount.
19:53 stickyboy FooBar: btw, meant to ask what transport you're using
19:58 FooBar tcp
20:00 FooBar TheDingy / stickyboy: That setup is currently 3 machines of 16 4TB disk, as a triple-replicated distributed setup
20:01 stickyboy FooBar: Cool.  I'm using TCP over 10GbE (but copper).
20:04 FooBar 1gbe copper here...
20:04 FooBar don't have ANY 10gb gear yet (only the trunk-ports on the switch are 10g/20g
20:37 TheDingy FooBar: With that size system you really should goto 10gig it is getting cheap
20:38 FooBar TheDingy: yup... multiple 1g is still sufficient... it's just a lot of space, not much I/O
20:38 TheDingy Ahhh
20:39 FooBar and all very small files
20:39 FooBar average size < 4k
20:42 stickyboy The latency on 10GbE copper almost kills me.
20:42 stickyboy I have the throughput, but the latency... wow.
20:43 TheDingy How much latency? Which switchgear are you using?
20:44 stickyboy TheDingy: I'm using an Arista 10GbE switch
20:44 stickyboy Haven't measured latency... but "ls" is slow... :)
20:45 stickyboy We use our gluster for home directories in a cluster.
20:45 TheDingy Well you need to measure it if it is slow
20:46 stickyboy TheDingy: ls is slow.  That's enough for me. :)
20:46 stickyboy Dunno how else to measure.  Ping?
20:46 TheDingy I would need to quantify it
20:46 TheDingy It better be under 1ms
20:47 TheDingy May not be network
20:48 TheDingy I never have used the Arista though
20:48 TheDingy It could be something funky going on with the interfaces as well;
20:51 andreask joined #gluster
20:51 stickyboy TheDingy: Yah, I tested a few components a few months ago, trying to isolate.
20:52 FooBar stickyboy: ls is always slow :)
20:52 stickyboy Raw dd to brick.  iperf over interfaces.
20:54 TheDingy that is what I would do just to start
20:55 TheDingy If you don't have a network tester, I think htop will help when you do that
21:03 stickyboy TheDingy: I used iperf
21:05 TheDingy What did you figure out?
21:10 TheDingy I am trying to figure out a bbq cook off that I want to goto
21:28 stickyboy TheDingy: I used iperf a few months ago when we installed the 10GbE.  Throughput is 9.91Gbit. :)
21:28 stickyboy Now, still dealing with this failed RAID.
21:29 stickyboy Every time I reboot my clients/servers half of them fail to boot due to "Mounting filesystems..."
21:29 stickyboy Or only mount some of my Gluster mounts (FUSE).
21:29 stickyboy I have to keep running to the server room.
21:31 TheDingy This is exactly why you do not use RAID in redundant large systems ;)
21:32 TheDingy Our very large SGI systems always used software raid back in the day
21:34 stickyboy TheDingy: I guess we have to learn all these lessons for ourselves. :)
21:35 stickyboy But I've been burned by hardware RAID before.
21:35 TheDingy This is true ;) I was very luck to work on some crazy systems when I was very green
21:35 stickyboy And I was at the mercy of the vendor, booting to funny DOS disks and shit.
21:35 stickyboy At least with md raid you can boot to some GNU/Linux live CD.
21:36 TheDingy In Novell 3.1 I had a machine boot the raid array and write 1's across the entire disk
21:37 TheDingy Stupid customer hadn't had a backup in about 6 months because the daughter went to college and "forgot" to tell ppl to change the tape
21:37 TheDingy But Novell was SO reliable, in 93 timeframe I would measure uptime in years
21:38 TheDingy Also I make sure none of my vdev's in ZFS never have more than the raid level on the same controller
21:39 stickyboy TheDingy: Do you ever have problems with systems pausing foreverrr at "Mounting filessytems..."?  It's really annoying.
21:40 TheDingy Not lately
21:40 TheDingy What about your drivers  and os?
21:44 stickyboy TheDingy: CentOS 6.5
21:44 TheDingy GFS or else on this
21:44 stickyboy I always have to go to single user mode then comment out my GlusterFS FUSE mounts in /etc/fstab
21:45 stickyboy Happens on some systems, not others.
21:45 TheDingy Raid drivers
21:45 stickyboy For awhile I thought I solved it with adding a LINKDELAY=20 to the network interface
21:46 TheDingy Your using 10gig right? which card?
21:47 stickyboy TheDingy: Intel cards
21:47 stickyboy All Intel, clients and servers.
21:47 TheDingy They did have some type of problem with their built in stuff
21:47 TheDingy I forget what it was
21:48 TheDingy Read about it on a ZFS blog
21:51 stickyboy TheDingy: About Intel NICs?
21:51 TheDingy Yep
21:51 stickyboy TheDingy: Do you use _netdev in your fstab entries for Gluster FUSE mounts?
21:52 TheDingy yes
21:52 stickyboy I don't think CentOS honors that.
21:52 TheDingy but I am just now designing another very large gfs system
21:52 stickyboy Even with the netfs service enabled at boot.
21:52 TheDingy It has been about two years
21:52 TheDingy it should
21:55 stickyboy Others boot fine, but only mount 2/3 of my volumes.
21:56 TheDingy weird you have something funky going on when your network interfaces come up
21:56 stickyboy Yah
21:56 stickyboy Sometimes the volume log says that the transport endpoint isn't connected
22:00 TheDingy falping interface?
22:00 TheDingy flaping interface
22:00 TheDingy I have seen that on 10g/e
22:01 stickyboy Flapping?
22:06 TheDingy Interface Flapping
22:06 glusterbot TheDingy: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
22:07 TheDingy stickyboy: a lot like bgp flapping it can happen with various very fast protocols as well, look for bad cables optics etc
22:08 TheDingy and if your on copper can happen very easy at 10gb wtih coper isn't a good idea right now
22:09 stickyboy TheDingy: Yah, we're using Cat6a, so it should ideally be "better"
22:09 stickyboy But I'm thinking of suggesting moving to Infiniband
22:10 TheDingy How many systems on Infiniband?
22:10 TheDingy Just 3 I don't know if I would do the headache
22:10 stickyboy TheDingy: 6
22:11 stickyboy err 7, but yeah.
22:11 stickyboy But we're doing genome sequencing and scientists are always hammering this stuff.
22:11 TheDingy Don't know if is the trouble
22:11 stickyboy TheDingy: Maybe 10GbE over some optical stuff then
22:11 TheDingy if your doing super compute etc I would do it
22:11 TheDingy Yeah, wouldn't do copper
22:11 TheDingy or look to 40gb
22:12 TheDingy I would do 40gb over Infiniband on anything smaller than 50-60 computers
22:14 stickyboy TheDingy: would or wouldn't?
22:15 TheDingy wouldn't
22:15 stickyboy That's what I thought
22:15 stickyboy I am not a network guy, so I think I'm more comfortable with Ethernet
22:15 stickyboy I need to look into switches
22:15 TheDingy Also I don't know about that switch I would look into cisco/brocade/juniper
22:15 TheDingy with optical
22:16 TheDingy Where are you located? I forgot
22:16 stickyboy TheDingy: Kenya
22:17 TheDingy Been a long time since I have been there and you didn't tell me that ;) I was debating NL and Chicago
22:19 stickyboy I mentioned Kenya earlier. :P
22:19 TheDingy I figured
22:19 TheDingy my memory has been bad today
22:19 stickyboy Lots of science out here.
22:20 TheDingy Yeah, do they hire many outside consultants from the states?
22:20 stickyboy TheDingy: All the time
22:20 stickyboy I've been here for 5 years though
22:21 TheDingy Are you from the states?
22:21 stickyboy Guess I'm damn near Kenyan.
22:21 stickyboy Yah
22:21 TheDingy LOL
22:22 rwheeler joined #gluster
23:46 stickyboy TheDingy: Ok, so I'm ready to replace this brick.
23:46 TheDingy stickyboy: LOL I would have already done that ;)
23:46 stickyboy But I don't know if the `replace-brick` feature was designed for my scenario.
23:46 stickyboy TheDingy: I got caught being anal and re-writing some ansible playbooks. :P
23:46 TheDingy nope it wasn't
23:47 TheDingy you have triple redundancy right?
23:47 stickyboy TheDingy: replica 2
23:47 stickyboy (and one of the replicas failed)
23:47 TheDingy How much data?
23:47 stickyboy 22TB
23:47 TheDingy Oh crap and 3 nodes?
23:48 TheDingy Or how much useable per node and how many nodes
23:48 stickyboy Only 2 nodes actually
23:48 stickyboy Just simple replica 2 with 2 nodes.
23:48 TheDingy So your second node is failing already?
23:48 TheDingy Or is that the one that you were working with?
23:49 stickyboy The second node failed, which is the one I just did the RAID rebuild / format / mount on.
23:49 stickyboy Now I've got a clean brick, just dunno how to get the data into it.
23:49 TheDingy Raid controller suppot jbod
23:50 stickyboy TheDingy: Yah, I think it must.
23:50 TheDingy stickyboy: Do you have a way to make a quick backup?
23:50 stickyboy TheDingy: A backup of the data on the working node?
23:51 TheDingy Yes
23:51 stickyboy Nope
23:51 TheDingy Ugh
23:51 TheDingy The volume replace-brick
23:51 TheDingy you may have to a foce then a heal and monitor it
23:52 stickyboy So the replace-brick command will be like...
23:53 stickyboy gluster volume replace-brick <volume> node2:<oldpath> node2:<newpath> start
23:53 TheDingy volume replace-brick volume a 192.168.2.2:/mount/g2lv5
23:53 TheDingy welll yea
23:53 stickyboy So old and new are the same, only the path is different.
23:54 stickyboy I wonder if I could add a brick, then rebalance.
23:54 TheDingy you have to do a force
23:54 TheDingy if names etc are the same
23:54 TheDingy be very very careful though
23:55 stickyboy I've mounted the new brick at a different path.
23:55 TheDingy then add and rebalance and then after that is disable old brick is the procedure that I would try BUT it has been years since I did this
23:56 stickyboy Seems like replacing a non-existent brick would just fail outright.
23:57 TheDingy not always been has been problematic
23:59 TheDingy not always but it has been problematic

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary