Camelia, the Perl 6 bug

IRC log for #gluster, 2012-10-26

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 badone joined #gluster
00:15 bfoster joined #gluster
00:16 jdarcy_ joined #gluster
00:16 kkeithley1 joined #gluster
00:40 bala1 joined #gluster
00:47 Ryan_Lane joined #gluster
00:47 Ryan_Lane I have one volume that's getting a lot of errors like this: http://pastebin.com/ws3A3vSx
00:47 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
00:54 Ryan_Lane seems I'm getting a bunch of self-heal failures
00:56 UnixDev joined #gluster
01:01 Bullardo joined #gluster
01:32 Bullardo joined #gluster
01:44 JZ_ joined #gluster
01:56 Bullardo joined #gluster
02:08 UnixDev joined #gluster
02:08 UnixDev how can I delete all gluster volumes on a host?
02:14 avati_ joined #gluster
02:15 sunus joined #gluster
02:29 Bullardo joined #gluster
02:44 sunus joined #gluster
02:58 ika2810 joined #gluster
03:02 dmachi1 joined #gluster
03:14 JoeJulian dmachi1: (mostly) synchronously
03:15 JoeJulian Ryan_Lane: I wonder if the .glusterfs/00/00/000*0001 is a directory instead of a symlink on one of your bricks.
03:16 JoeJulian UnixDev: Are your volumes single-server volumes?
03:17 JoeJulian UnixDev: ... also does the server you wish to delete the volumes from have any peers?
03:17 UnixDev they did, finally got them deleted
03:17 JoeJulian Ah, ok
03:17 UnixDev seems like they got stuck on one peer
03:18 sunus JoeJulian: hi i got this problem, and i try almost i know but i still get rid of it, http://community.gluster.org/q/can-not-cr​eate-new-volume-after-created-one-volume/
03:18 UnixDev btw, I find your blog very helpful… thank you very much
03:18 glusterbot Title: Question: can not create new volume after created one volume. (at community.gluster.org)
03:18 JoeJulian You're welcome. Thanks for the feedback. :)
03:19 UnixDev been testing gluster possibly for a new production setup in vmware… some things are great, but ended up with a split-brain during a failure test scenario
03:20 sunus and i post 2 comments in semiosis's answer
03:20 JoeJulian @extended attributes
03:20 glusterbot JoeJulian: (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://hekafs.org/index.php/2011/​04/glusterfs-extended-attributes/
03:20 JoeJulian But I honestly don't think that's the issue.
03:21 sunus JoeJulian: me too. because they are all clean & new create dir
03:21 Technicool UnixDev, are you using Virtual Center?
03:21 UnixDev yes
03:21 Technicool replicated volume?
03:22 bharata joined #gluster
03:22 Technicool UnixDev, ^^
03:23 JoeJulian Technicool: http://fpaste.org/3OBY/
03:23 glusterbot Title: Fedora Pastebin - by Fedora Unity (at fpaste.org)
03:23 Technicool thanks JoeJulian
03:23 Technicool hmm, that should work ok
03:24 JoeJulian I know, right?
03:24 Technicool esx 5?
03:24 JoeJulian But the "not available" seems odd.
03:24 UnixDev Technicool: 5.1
03:24 Technicool last i tested for was 4.1, but i doubt if anything changed on the vmware side for this
03:25 Technicool that error though, i got that the other day
03:25 Technicool and it wouldn't clear
03:25 anti_user hey glusters!
03:25 Technicool and i can't remeber what i did to fix it...
03:25 Technicool ugh...so old...
03:25 JoeJulian lol
03:26 JoeJulian UnixDev: Can I see the etc-glusterfs-glusterd.vol.log from .3 plese?
03:26 JoeJulian s/plese/please/
03:26 glusterbot What JoeJulian meant to say was: UnixDev: Can I see the etc-glusterfs-glusterd.vol.log from .3 please?
03:26 Technicool ah, didn't see that it was striped
03:26 Technicool that would not be my preferred setup for vm's
03:27 anti_user if i resize partition on one node where is my datastore place, what will be with gluster cluster?
03:27 Technicool as long as you aren't doing it to make things "faster" at least i guess
03:27 anti_user resize - maximize
03:27 JoeJulian I already pointed him at ,,(stripe). But I also made the assumption he's still in testing phase.
03:27 glusterbot Please see http://joejulian.name/blog/sho​uld-i-use-stripe-on-glusterfs/ about stripe volumes.
03:27 UnixDev Technicool: yes, I was using replicated volume with heartbeat and virtual ip on  venter 5.1
03:28 Technicool it still doesn't explain the split brain part, and i don't use stripe enough to know what to look for
03:28 JoeJulian anti_user: If you resize the filesystem of a brick of a distributed volume, the volume capacity will change accordingly.
03:28 Technicool UnixDev, unless something has changed, vips aren't supported via VC are they?
03:28 UnixDev Technicool: split brain happened during simulated failure and then failback
03:28 sunus JoeJulian: getfattr get nothing from output
03:28 Technicool if you want to use VIP's, you need to take VC out of the mix
03:29 UnixDev Technicool: the vip is something that happens on the storage nodes
03:29 UnixDev venter sees 1 ip, the vip
03:29 Technicool do the test again with ESX directly
03:29 Technicool UnixDev, yes, but does the VIP ever change?
03:29 Technicool eg, when failure occurs, a new host takes over and uses that VIP a la carp?
03:30 UnixDev it changes nodes, yes, during simulated failure of one of the storage nodes (providing nfs export to vcenter on vip)
03:30 Technicool if so, VC doesn't know how to deal with that
03:30 Technicool at least not in vSphere 4, and i doubt if that has changed although if anyone can correct me i would love to be wrong about it finally
03:30 UnixDev what do you mean? 4 and 5 are totally different
03:31 JoeJulian sunus (I mean): Can I see the etc-glusterfs-glusterd.vol.log from .3 plese?
03:31 Technicool UnixDev, they arent that different, and this one piece i would be has not changed
03:31 Technicool again, would love to be wrong
03:31 Technicool but
03:31 Technicool in case i am not
03:32 Technicool try with just ESX, no management via VC or cloudsphere or viacloud or whatever they call it now
03:32 Technicool if that works, then its very likely the same issue as before, that VC does not know how to handle changes made outside of its control
03:33 Technicool esp. when it comes to network issues  / hostnames or IP changes etc
03:33 Technicool you can simulate the same thing without Gluster, just set up a bunch of round robin IP's, fail a node, and watch all your VM's grey out
03:33 UnixDev Technicool: the is outside of the scope of vc, if the vip changes servers, it does not matter because vmware sees the same ip all the time…
03:33 Technicool UnixDev, incorrect
03:34 sunus JoeJulian: ok, but i don't have /etc/glusterfs/glusterd.vol.log not even /etc/glusterfs  i complied it from the source
03:34 Technicool if the vip changes, then VC knows it changed becuase there are values that don't match its internal DB, but it doesn't know how they changed, who changed them, or what to do about it
03:35 JoeJulian sunus: Where are your logs? Looking for *.glusterd.vol.log there.
03:35 Technicool VC cares about more than just the IP, if it didn't the round robin test i described above would never have an issue, right?
03:35 sunus JoeJulian: ok, wait a sec i will post it when i find it
03:36 UnixDev round robin is not the same thing, there needs to be a mount session.. and perhaps this is the failure, esx needs to try to "remount" it
03:36 JoeJulian btw, why *are* you compiling your own?
03:36 Technicool UnixDev, ok, just telling you based on the experience i had from when i worked at VMware
03:37 sunus JoeJulian: i am trying to learn from it and write some xlators..
03:37 sunus JoeJulian: ^^
03:37 JoeJulian Ah, cool.
03:37 JoeJulian very cool in fact
03:37 Technicool sunus, very cool
03:37 UnixDev Technicool: so what do you recommend for interfacing esx with gluster?
03:38 Technicool at this time i don't actually, not in production
03:38 * JoeJulian prefers using an install dvd to fix esx.
03:38 anti_user okay! thanks JoeJulian, another question, is it realy to use brick as one partiotion (not server at all), for example i have 2 servers and 5 disks on each server, can i add brick as one hard drive, then another brick as next hard drive and so on
03:38 Technicool but, if it is in a lab, i was able to run 80 VM's simultaneously with acceptable performance using the worst hardware imaginable in a 2 node pure DHT setup
03:38 sunus JoeJulian: in fact, i read some code of it but when i actually try it, it gets errors i mentions above. now i gotta get the logs for you:)
03:39 JoeJulian anti_user: GlusterFS only cares about filesystems. If you create a filesystem on each hard drive and want to make those each a brick, that will work just fine.
03:40 anti_user we want to do it, because company want to buy RAID controllers for each servers
03:41 anti_user *dont want, sorry
03:42 UnixDev Technicool: what would be used instead of gluster in production to delivery similar features
03:43 anti_user how our company can make donate to this project? we are very happy to discover GlusterFS and begin use it
03:43 JoeJulian ~commercial | anti_user
03:43 glusterbot anti_user: Commercial support of GlusterFS is done as Red Hat Storage, part of Red Hat Enterprise Linux Server: see https://www.redhat.com/wapps/store/catalog.html for pricing also see http://www.redhat.com/products/storage/ .
03:43 Technicool UnixDev, i wish i knew...i pursued the issue with them, but it was actually the results of that that caused me to cancel all my VMware subscriptions
03:45 Technicool anti_user, we are also thrilled with developer contribution if you are looking for a way to give back
03:46 sunus JoeJulian: glusterd -l /root/vm3.log --debug   is that enough to get the logs?
03:47 anti_user we will be glad if project stay freeware by gpl license and stay on hard develop
03:47 Technicool UnixDev, i can also say, the worst issue i ever saw with any product, anywhere, was gluster and VMware...
03:47 JoeJulian anti_user: Or just hang out in here and help people. That also helps you become even more of an expert in  the field.
03:48 Technicool there was a two sided bug in NFS, wherein we sent a null packet, and VMware improperly handled it
03:48 Technicool the end result was consecutive chain PSOD's resulting from DRS migrating the bad VM as its last heroic deed..."Run little buddy! Save yourself!"
03:49 JoeJulian sunus: --debug will override -l
03:49 JoeJulian Just do the -l
03:49 Technicool so it would, to the next ESX host, then the next, PSOD'ing each as it went along
03:49 anti_user okay! i try it, but yet im not even advanced user of gluster (use it for few days)
03:49 sunus JoeJulian: i got the logs, i will upload in a sec
03:50 Technicool it was an almost impossible bug to repro, and the bug has been fixed on both sides now (at least, on ours i know it has), but it was still the kind of horrific event that you can't look away from or stop eating popcorn while you watch
03:50 UnixDev Technicool: there was an svn commit not to long ago, before esx machines could not power on from gluster nfs
03:50 anti_user we want to use gluster as storage about 100TB, we can try chroot users there in SFTP, FTP,SAMBA and maybe rsync
03:50 Technicool as in ESX installed on Gluster?
03:51 Technicool thats pretty cool
03:51 UnixDev as in esx storing vms on gluster volume
03:51 Technicool wait, which part couldn't power on?
03:51 Technicool or do you mean, storing on gluster and accessing via the native client as opposed to NFS?
03:52 UnixDev https://bugzilla.redhat.com/show_bug.cgi?id=835336
03:52 glusterbot Bug 835336: unspecified, unspecified, ---, ksriniva, ON_QA , gluster nfs fails to launch vmware esxi VMs
03:52 Technicool because if THAT happened, then that might actualy get you what you need
03:52 * JoeJulian prefers kvm on fuse mounts. No fiddling about with kludging things together to force them to work.
03:52 Technicool reading
03:52 UnixDev JoeJulian: what do you use to manage the kvm's ?
03:52 UnixDev JoeJulian: are you using KSM, and if so, how are the ratios?
03:52 JoeJulian puppet currently. I plan on migrating to openstack.
03:53 UnixDev JoeJulian: I've looked at openstack.. also cloudstack is now free as well
03:53 JoeJulian I don't, though I probably should.
03:53 JoeJulian I'd also like to look at oVirt.
03:54 UnixDev I experimented with that heavily in the lab, latest release… very fragile… I think it will be ready, but does not seem like it today
03:55 Technicool UnixDev, wierd....was definitely not a problem for me with 3.0, 3.1, and 3.2.x with ESX or ESXi
03:56 Technicool ovirt is a great answer to ESXi managed by hand/powershell/rcli
03:56 sunus JoeJulian: hi log: http://fpaste.org/h8Bg/
03:56 glusterbot Title: Viewing log by sunus (at fpaste.org)
03:56 UnixDev Technicool: you have ovirt managing esx hosts?
03:56 Technicool lol no, i have no more ESX hosts since that last interaction mentioned above
03:57 Technicool not even ESXi
03:57 JoeJulian sunus: Can you do that again with the failed volume creation?
03:57 UnixDev ahh, lol
03:57 Technicool their effort and answers did not make me feel like being a paying customer anymore, but im a demanding customer
03:58 Technicool i would say that KVM has made amazing strides, i wouldn't have ever considered it in production three years ago
03:58 JoeJulian True
03:58 Technicool not unless i wanted unemployment
03:58 sunus JoeJulian: i think i fetch the log after the failed creation action, i will do it again to ensure that
03:59 Technicool but i have been pretty pleased lately with it...the main thing that made me stay away was that i had my automation fairly slick in ESX via VMware Studio
03:59 Technicool once i decided to actually do the work for the same automation in KVM, it actually is faster for me now
04:00 Technicool and just saw that ovirt support pools, which was the thing i loved the most about vSphere
04:00 Technicool pools as in start and stop groups of vm's at once
04:00 sripathi joined #gluster
04:01 Technicool so if i want eight gluster nodes, push a button and bam, just like i had it before
04:01 JoeJulian kvm + puppet supports Jacuzzis.
04:01 JoeJulian ... as in I have time to hang out in the jacuzzi.
04:01 Technicool lol JoeJulian
04:02 UnixDev lol
04:02 bulde1 joined #gluster
04:02 Technicool redundant server farm?  $100,00  3 year support and professional services $32,000
04:02 Technicool jacuzzi time?
04:02 Technicool priceless
04:02 * Technicool needs to get some of the KVM/puppet magic going one day as well
04:02 sunus JoeJulian: logs has no changes..
04:03 Technicool for now its customization via guestfish
04:03 * JoeJulian is confused.
04:04 JoeJulian ~pastestatus | sunus
04:04 glusterbot sunus: Please paste the output of "gluster peer status" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
04:04 JoeJulian From more than one server please.
04:07 sunus JoeJulian: http://flic.kr/p/dnTr82
04:07 glusterbot Title: Screenshot from 2012-10-26 12:08:10 | Flickr - Photo Sharing! (at flic.kr)
04:07 sunus JoeJulian: i am working with qemus, so the screenshots is more fast
04:09 JoeJulian sunus: "yum install fpaste" is even more fast than that. :D
04:10 sunus JoeJulian: lol, sorry for i don't even know what that is:( vm2 is 192.168.10.3 and vm1 is 192.168.10.2
04:10 * JoeJulian swears at his computers in much the same way as sunus.
04:10 Technicool i would never do that
04:10 JoeJulian lol
04:10 Technicool swear at your guys computers
04:10 Technicool but mine?  oh man...
04:10 Technicool oh...man...
04:10 JoeJulian You probably use words that I haven't even heard of when you swear at computers.
04:11 Technicool UnixDev, im assuming you enabled the 32 bit NFS already?
04:12 JoeJulian I hate the ambiguity of "failures" in the rebalance status.
04:12 UnixDev Technicool: what do you mean?
04:12 sunus JoeJulian: hi got anything strange?
04:12 Technicool UnixDev, setting nfs.enable-ino32: on
04:13 JoeJulian sunus: No. What server were you issuing the create volume command from?
04:14 JoeJulian nm... vm1...
04:14 * Technicool says jinkys to JoeJulian
04:14 Technicool then says nm to self
04:14 UnixDev Technicool: I've tried that, but it works both ways
04:14 JoeJulian sunus: On vm-1 do: "ip addr show"
04:15 Technicool UnixDev, i would suggest leaving it on for now since that wasn't necessary before
04:15 Technicool turning on ino32 wasnt needed to run VM's in ESX i mean
04:16 UnixDev Technicool: and its not now, so why have it on?
04:16 Technicool UnixDev, because ESX cares now for some reason, so more likely to help you avoid pain than cause more
04:16 sunus sunus: vm-1
04:16 Technicool not strictly necessary, to your point
04:17 sunus JoeJulian: i issuing the cmd from vm1 ,same as the previously successed create volume cmd
04:17 JoeJulian sunus: On vm-1 do: "ip addr show"
04:18 * JoeJulian rocks the up arrow.
04:18 Technicool UnixDev, I am curious as to whether you will see the split brain with just ESX
04:18 sunus JoeJulian: 192.168.10.2
04:18 Technicool sunus, you have three nodes or two?
04:18 sunus 3
04:19 Technicool can you paste the output from /path/to/glusterd/glusterd.info from all three?  (
04:20 sunus okay, wait a sec
04:20 JoeJulian Also, check the address on each. Make sure there are no surprises.
04:22 sunus JoeJulian: no suprises, i created a dist vol with 3 brick like this and it went succeed
04:22 sunus JoeJulian:  volume create dist_vol transport tcp dist_vol  192.168.10.2:/gfdata/dist_brk1 192.168.10.3:/gfdata/disk_brk2  192.168.10.4:/gfdata/disk_brk3   it's successfulyy
04:23 JoeJulian I know but since this doesn't make any sense, I have to check all the possible stupid mistakes and rule them out.
04:24 JoeJulian Technicool: status shows they each have different uuids.
04:24 sunus JoeJulian: yeah, it really confused me a lot, when i delete the previous successfuly volume, that can creat again. the strip volume..
04:24 Technicool i thought he had three nodes though?
04:24 sunus Technicool: yeah, i have three node
04:24 Technicool closed the pic, moment
04:24 JoeJulian http://www.flickr.com/photo​s/60064288@N02/8123962851/
04:24 glusterbot Title: Screenshot from 2012-10-26 12:08:10 | Flickr - Photo Sharing! (at www.flickr.com)
04:25 JoeJulian See, three servers.
04:25 JoeJulian Connected to an undisclosed number of nodes. :P
04:25 yeming joined #gluster
04:25 sunus undisclosed number ? what?
04:26 Technicool JoeJulian, yep saw that but didn't see any issue from thew output
04:26 Technicool state looks good, uuid's are unique and consistent
04:26 Technicool ah
04:26 Technicool wait
04:26 JoeJulian Right. It shows three servers each with different uuids.
04:26 sunus Technicool: wow, what do you find?
04:26 Technicool right, so, checking glusterd.info
04:26 Technicool sunus, nothing i thought i was hip cool and smart
04:26 Technicool then reality struck
04:26 JoeJulian lol
04:27 Bullardo joined #gluster
04:27 Technicool was thinking you provisioned from a VM template, but if that was the case, you wouldn't have been successful with creating the previous volume
04:27 Technicool either way, the glusterd.info output will tell us for sure that there isn't anything wonky there
04:28 JoeJulian sunus: I'm picking on Technicool about his incorrect use of the word "node" which is technically any endpoint. Could be a server, a client, or other things that won't be mentioned.
04:28 sunus Technicool: in order to prevent from that, so i actually dont use any vm template, i install 3 nodes manually
04:28 Technicool sunus, its a good plan
04:28 Technicool lol@ JoeJulian
04:29 sunus Technicool: where to get the glusterd.info?
04:29 Technicool sunus, whereever your config directory ended up
04:29 Technicool /opt/gluster?
04:29 sripathi1 joined #gluster
04:29 Technicool havent installed from source since 3.1.x
04:29 JoeJulian /usr/local/var/lib/glusterd?
04:29 Technicool or not much longer after that at any rate
04:30 Technicool that looks even better
04:30 Technicool and use his version of node
04:30 JoeJulian I've got a middle node right here for you, Technicool.
04:30 Technicool lol
04:31 sunus JoeJulian:  Technicool: i have something to take care of right now, can i see you guys soon? maybe in an hour?
04:31 Technicool im very proud of myself from refraining from the obvious segways into "never touch the backends" jokes
04:31 bulde1 joined #gluster
04:31 JoeJulian rofl...
04:31 Technicool sunus, hopefully not but from the way tonight looks, most likely
04:31 Technicool ')
04:31 Technicool ;)
04:31 sunus JoeJulian: Technicool: i really really want to stay to get this problem solved, but i gotta go now.. hope i can see you!
04:31 JoeJulian sunus: Maybe. It's getting late and my daughter needs some dinner. She just woke up from a very late (and long) nap.
04:32 sunus I forget you have families..
04:32 JoeJulian No, just one.
04:32 sunus haha
04:32 JoeJulian :)
04:32 Technicool psh, families....what do they ever give me?
04:32 Technicool oh right, love
04:32 Technicool always forget that one
04:32 sunus still don't find glusterd.info
04:32 deepakcs joined #gluster
04:32 Technicool find / | grep glusterd should finish before you get back
04:32 JoeJulian Before you leave, start: find / -name glusterd.info
04:33 sunus that is what i did^^
04:33 sunus hoping it get what i want
04:33 JoeJulian find / -name vols
04:33 sunus i get that
04:33 sunus wait a sec, now upload
04:34 sunus glusterd.info it's only uuid
04:35 sunus uuids are same as peer status
04:37 JoeJulian Oh! What if you try creating the volumes using CamelCase instead of underscores ("_")?!
04:37 * JoeJulian has an idea
04:37 sunus i willl try
04:37 Technicool that better not be a thing
04:37 JoeJulian I know, right?
04:39 sunus volume create StrVol stripe 3 192.168.10.2:/gfdata/StrBrk1 192.168.10.3:/gfdata/StrBrk2 192.168.10.4:/gfdata/StrBrk3         it's the same..
04:39 JoeJulian crap
04:39 sunus what?
04:39 JoeJulian I was hoping I had an idea.
04:40 sunus lol
04:40 JoeJulian Is it still .3 that fails?
04:40 sunus yeah
04:40 sunus but i tried using .2 and .4 still get the same,
04:40 JoeJulian Ah, with stripe 2 I presume?
04:40 sunus yeah
04:40 Technicool whew...i was about to break out the lucky charms and Lagavulin for good luck
04:40 sunus then .4 fails
04:41 JoeJulian Interesting.
04:41 sunus with same message
04:41 JoeJulian I'll brb and see if I can repro.
04:41 sunus lol, i really got to go now, if i come back and you are still here
04:41 sunus hoping to meet you then:) it looks like i say it to a sweet girl:)
04:41 sunus bye
05:05 zhashuyu joined #gluster
05:08 JoeJulian Nope, can't repro.
05:32 puebele joined #gluster
05:35 sunus JoeJulian: hi i am back
05:35 sunus JoeJulian: are you still here?
05:43 lng joined #gluster
05:44 lng Hi! Man quote: "Make sure you start your volumes before you try to mount them or else client operations after the mount will hang". I have noticed if cluster becomes offline for some reason, clients also hang. Is it possible to avoid this behaviour?
05:45 avati_ joined #gluster
05:48 tru_tru joined #gluster
05:52 badone joined #gluster
06:11 yeming joined #gluster
06:13 yeming My bricks are all 300G. Can I use stripe or something to make my gluster support files larger than 300G, say, 1T?
06:20 Humble joined #gluster
06:30 JoeJulian yeming: Yep, that's the most accepted purpose of stripe.
06:31 JoeJulian ~stripe | yeming
06:31 glusterbot yeming: Please see http://joejulian.name/blog/sho​uld-i-use-stripe-on-glusterfs/ about stripe volumes.
06:31 JoeJulian lng: Are the clients hanging for more than 42 seconds?
06:31 JoeJulian sunus
06:36 guigui1 joined #gluster
06:37 yeming JoeJulian: I've been reading the same. But when I use stripe and check the file created on brick, it's the same size as the original file.
06:38 yeming Is it supposed to be half the size if stripe # is 2?
06:38 JoeJulian not really, it's a sparse file.
06:39 lkoranda joined #gluster
06:39 yeming I see.
06:41 sripathi joined #gluster
06:44 lkoranda joined #gluster
06:45 Nr18 joined #gluster
06:52 ctria joined #gluster
06:59 sripathi joined #gluster
07:05 ramkrsna joined #gluster
07:28 ngoswami joined #gluster
07:30 TheHaven joined #gluster
07:53 anti_user joined #gluster
07:54 anti_user hello! i came back from testing plants xD
07:55 anti_user so i create 6 virtual machines, with identical configuration - rootfs = 5GB homefs = 10GB
07:55 anti_user homefs on each machine im using fully for datastore
07:56 anti_user then i probe 5 nodes (gluster-node-02 03 04 05 06)
07:56 anti_user and created datastore with replica 2
07:57 anti_user then i added 4 bricks
07:58 anti_user on output i have 30GB with effective datastore
08:00 anti_user but when i place file with 11 GB to /mnt/datastore where i mount my datastore (gluster-node-01:datastore on /mnt/datastore type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_​permissions,allow_other,max_read=131072)) at the end i receive that partition full
08:00 anti_user and yes! parition /home is full, but why? i have 30GB datastore
08:02 dobber joined #gluster
08:02 anti_user and another fact that my brick-03 brick-04 brick-06 /home absolutly not used
08:03 anti_user and brick-05 too
08:04 Triade joined #gluster
08:06 anti_user Volume Name: datastore
08:06 anti_user Type: Distributed-Replicate
08:06 anti_user Volume ID: c52eaed1-2087-48cf-9c59-06c4864366d2
08:06 anti_user Status: Started
08:06 anti_user Number of Bricks: 3 x 2 = 6
08:06 anti_user Transport-type: tcp
08:06 anti_user Bricks:
08:06 anti_user Brick1: gluster-node-01:/home
08:06 anti_user Brick2: gluster-node-02:/home
08:06 anti_user Brick3: gluster-node-03:/home
08:06 anti_user Brick4: gluster-node-04:/home
08:06 anti_user Brick5: gluster-node-05:/home
08:06 anti_user Brick6: gluster-node-06:/home
08:06 anti_user Options Reconfigured:
08:06 anti_user auth.allow: 192.168.56.*
08:06 anti_user was kicked by glusterbot: JoeJulian
08:09 JoeJulian @op
08:09 tjikkun_work joined #gluster
08:11 JoeJulian @invite anti_user
08:11 glusterbot JoeJulian: The operation succeeded.
08:11 anti_user joined #gluster
08:12 JoeJulian anti_user: Did you add bricks after initially creating your volume?
08:12 anti_user yes
08:13 JoeJulian Did you rebalance, or at the very least rebalance fix-layout?
08:13 anti_user i do rebalance not fix-layout
08:15 JoeJulian Well, my understanding is that a normal rebalance should also fix-layout before it migrates files. I wonder if I'm wrong about that.
08:15 anti_user git checkout is today - fresh install
08:16 anti_user Number of Bricks: 3 x 2 = 6
08:16 anti_user true configuration?
08:17 JoeJulian When you add bricks to a volume, the distribute hashes aren't reset to allow new files to be placed on those new bricks. New directories may be, but files in existing directories won't.
08:18 JoeJulian That's why the rebalance is necessary
08:18 anti_user okay, what should i do now to perfom normal operation?
08:19 anti_user /dev/sdb1                  9.7G  9.7G     0 100% /home  gluster-node-01:datastore   29G   10G   18G  36% /mnt/datastore
08:19 kd joined #gluster
08:20 JoeJulian gluster volume rebalance datastore start
08:21 anti_user whoom its working now, but i did it earlier, strange
08:22 anti_user and gluster me said that command completed sucessfully,
08:22 anti_user *sorry for my english, said to me)))
08:23 anti_user im russian engeneer
08:23 kd left #gluster
08:23 JoeJulian So "gluster volume rebalance datastore status" shows it's moving stuff?
08:24 anti_user that show me completed, when i have this problem
08:24 JoeJulian /home is still 100%?
08:25 anti_user now my home is 82% full
08:25 JoeJulian Oh good. At least it's an improvement.
08:25 JoeJulian I would try the rebalance again
08:26 anti_user yeah, but i think its incorrectly, because rebalance status was completed
08:26 JoeJulian I don't know why, but when I tried that today I had several files that failed to move but moved in subsequent tries.
08:26 JoeJulian Right, so rebalance..start again.
08:27 anti_user /dev/sdb1                  9.7G  7.4G  1.8G  82% /home gluster-node-01:datastore   29G   15G   13G  55% /mnt/datastore now
08:27 anti_user # du -hs /mnt/datastore  15G
08:29 anti_user hmm status show me 2 failures on first node
08:30 anti_user okay, next strange
08:30 JoeJulian I plan on asking one of the devs about that and probably filing a bug tomorrow. Right now, though, it's 01:30am here and I'm falling asleep at the keyboard. I better go hit the pillow. Good luck.
08:30 anti_user now i have on first and second nodes 82% full
08:31 anti_user oooh! good night
08:31 JoeJulian And remember to use fpaste.org ;)
08:31 anti_user and thanks again
08:31 anti_user okay! ill use it
08:41 Tarok joined #gluster
08:51 deepakcs joined #gluster
08:53 Hex_ joined #gluster
08:56 sunus JoeJulian: hi are you here?
08:57 Hex_ hi everyone
08:58 Hex_ I am desperately looking for an answer to the following question:
08:58 sripathi1 joined #gluster
08:58 sunus Hex_:  me too!
08:59 Hex_ great - let's have a look together...
09:00 Hex_ I am changing the hostname of a server
09:01 Hex_ how can I notify all peers in my gluster cluster of this change?
09:05 anti_user i think that you should to change it in /var/lib/gluster/peers/
09:06 sunus JoeJulian: one moment!
09:07 Tarok joined #gluster
09:09 Hex_ anti_user: you mean the files in /etc/glusterfs/? (/var/lib/gluster does not exist here)
09:09 anti_user im using latest ver of glusterfs
09:10 anti_user please find directory with your peers
09:10 Hex_ right; I am using a 3.2.x one
09:10 anti_user now you should find files with peers
09:10 Hex_ the folder with uuids in in /etc/glusterd/peers, is that the one you mean?
09:11 anti_user yeah!
09:11 anti_user find file with your old peers and rename it
09:11 Hex_ ok - what happens to the rest of the config when I change that file
09:11 anti_user and do backup of this file
09:11 anti_user restart gluster service
09:12 Hex_ the hostname is also in vols/volname/info and others
09:12 anti_user hmm then stop and try to remove brick and add a new one
09:13 anti_user gluster volume replace-brick dist rhs-lab3:/data/dist rhs-lab4:/data/dist start
09:14 Hex_ yes, but rhs-lab3 and rhs-lab4 are the same machines in my case
09:14 anti_user lab3 - its old brick lab4 new brick
09:14 anti_user its for example
09:15 anti_user you should delete old brick and add new (if you change hostname)
09:16 anti_user and do peer probe before
09:16 Hex_ ok - thanks!
09:18 anti_user tell me for information you results
09:38 sunus when i create a volume with 3 bricks on 3 servers, then i can not create new volumes anymore, in fact, the machine issued the command(create volume) didn't even set out to other servers
09:46 pkoro joined #gluster
09:58 tryggvil joined #gluster
10:03 ramkrsna joined #gluster
10:10 sunus does a glusterd can ONLY control 1 volume?
10:27 Humble joined #gluster
10:30 ekuric joined #gluster
10:51 anti_user okay find another bug of git version
10:51 anti_user i have 6 nodes and one datastore
10:52 anti_user on each server i have 10GB diskspace, and 30gb available on cluster
10:52 anti_user i want to dd file with 15GB
10:52 anti_user but i couldnt, cluster work only with node 1 and node 2
10:53 anti_user other nodes (3,4,5,6) free
10:53 anti_user i do rebalance and rebalance force too, but it didnt help
11:07 mgebbe joined #gluster
11:10 mgebbe_ joined #gluster
11:10 Sjoerd_ joined #gluster
11:22 nightwalk joined #gluster
11:22 deckid joined #gluster
11:27 bala joined #gluster
11:35 tryggvil_ joined #gluster
11:37 deckid left #gluster
11:39 ekuric1 joined #gluster
11:43 ekuric1 left #gluster
11:45 deckid joined #gluster
11:57 sripathi joined #gluster
12:07 TheHaven joined #gluster
12:08 edward1 joined #gluster
12:25 royh left #gluster
12:41 qubit left #gluster
12:56 13WABPA6E joined #gluster
12:57 mdarade1 left #gluster
12:57 balunasj joined #gluster
12:59 plarsen joined #gluster
13:00 aliguori joined #gluster
13:01 guigui1 joined #gluster
13:04 johnmark interesting - http://rubygems.org/gems/gflocator
13:05 glusterbot Title: gflocator | RubyGems.org | your community gem host (at rubygems.org)
13:11 * ndevos doesnt do ruby
13:12 dmachi1 JoeJulian: What does "mostly" syncronous mean.  The behavior I see is that sometimes I get the extended attributes back and sometimes not (no errors)
13:16 puebele1 joined #gluster
13:17 manik joined #gluster
13:31 Nr18_ joined #gluster
13:43 tryggvil joined #gluster
13:48 rwheeler joined #gluster
13:51 mtanner joined #gluster
13:53 tryggvil joined #gluster
13:54 tmirks joined #gluster
14:00 tryggvil joined #gluster
14:01 stopbit joined #gluster
14:17 Nr18 joined #gluster
14:17 wushudoin joined #gluster
14:21 robo joined #gluster
14:25 chouchins joined #gluster
14:28 Kaza1 joined #gluster
14:29 Kaza1 Hello, a quick question: when I start a new 2 node replicate volume with existing data in the brick of one node. Will it be copied automatically to the empty brick of the other node?
14:37 guigui1 joined #gluster
14:39 UnixDev I added a brick to a volume with data and 1 brick. when I added it, i changed the replica to 2. how long before the data is in sync? and is there some type of progress bar? when I do volume heal vol info it looks like split-brain but it seems to be in progress
14:39 UnixDev Kaza1: what version are you using?
14:43 Kaza1 It will be 3.3
14:46 Kaza1 Actually, we want to migrate to 3.3 from the former Gluster SSA
14:49 VisionNL granted, we are using a known mem leaking version and punching it here and there, but still a lot of mem in use on the client: 9530 root      15   0 14.1g  13g 1696 S 123.6 22.1   4769:28 glusterfs
14:49 VisionNL version 3.2.5 with ~100TB of file volume in ~ 10M files
14:49 VisionNL ow, and it still works :-)
14:51 waldner left #gluster
15:05 hchiramm_ joined #gluster
15:06 ika2810 joined #gluster
15:06 semiosis :O
15:11 samppah :O
15:16 usrlocalsbin joined #gluster
15:16 usrlocalsbin left #gluster
15:35 Sjoerd_ when I do a volume replace-brick operation, gluster returns 'operation failed on server_x'; how do I find what the error is?
15:38 madphoenix joined #gluster
15:40 madphoenix I have a four-node distributed volume on glusterfs 3.3, and am trying to understand what happens in various failure scenarios.  If one of the four nodes comprising that volume becomes unavailable (network, hardware, OS crash, etc.), what happens to the volume?  Does it stay online, take itself offline, etc.?
15:41 JoeJulian madphoenix: without replication the volume stays online but the files that are on the missing brick(s) will not be available. Additionally files that /should/ be created on the missing brick(s) won't be able to be created until the missing return.
15:43 madphoenix So does that mean every fourth file would fail to write?
15:44 madphoenix or would I just not be able to write to the volume at all?
15:45 JoeJulian In a perfect world, yes, every 4th.
15:45 madphoenix heh ;)
15:45 madphoenix I'm not sure that's actually desireable - it seems to me that better behavior would be to just enter read-only mode
15:46 Bullardo joined #gluster
15:46 JoeJulian Since you don't reverse-engineer the hashing algorithm to determine filenames, it'll probably not be every 4th.
15:46 agwells07142 left #gluster
15:47 madphoenix At that point I suppose it's up to the writing application to re-try the write, or just fail
15:47 JoeJulian Well, you know by hash whether or not the brick is missing for that filename. If there's a sticky pointer, then you know the file is not accessible. If the hash points to an existing brick and there's no sticky pointer, why not?
15:48 JoeJulian Yes. Or perhaps your app could try different filenames until it succeeds?
15:48 * JoeJulian prefers replication.
15:49 madphoenix For our in-house apps that may be workable, but we have a fair number of other applications that will not be aware that they are writing to gluster...
15:50 madphoenix I agree on replication for HA situations.  But losing half of our capacity really bites ;).  Basically we have a scenario where we could live with a quarter of our files being offline for n number of hours (for hw replacement or whatever), but we'd still like to be able to write to the remaining online bricks during that period as well
15:50 madphoenix and then rebalance once the failed brick comes back online
15:50 madphoenix again, in an ideal world ;)
15:50 Sjoerd_ when I do a volume replace-brick operation, gluster returns 'operation failed on server_x'; how do I find what the error is?
15:50 guigui1 left #gluster
15:51 bala joined #gluster
15:53 JoeJulian madphoenix: The problem with allowing just anything to write is that the file may exist already on the missing brick. Since that possibility can be pretty safely guessed, that's why it'll error on those files.
15:53 JoeJulian Sjoerd_: Check your log files. Probably /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
15:56 UnixDev joined #gluster
15:57 madphoenix JoeJulian: I guess I don't follow.  In a distributed volume where every file only exists once, how could a new file being written exist on the dead brick?
15:59 JoeJulian johnmark: not sure how interesting a daemon is for querying the output of getfattr -n trusted.glusterfs.pathinfo though. Even less interesting that it's ruby. :P
16:00 dmachi1 madphoenix: it wouldn't know if it was new or not, because it may exist on the dead brick already as I understand it.
16:00 johnmark JoeJulian: lol
16:00 johnmark I'm just happy someone thought of it
16:01 dmachi1 JoeJulian: can you explain to me what you meant by "mostly" sync setting of xattr?
16:01 plarsen joined #gluster
16:02 JoeJulian UnixDev: regarding your question about converting to replica 2... As an annoying answer to your first question, "How long is a string?" No, there is not progress bar. When volume heal $vol info stops listing files, it's done.
16:02 Sjoerd_ JoeJulian: yes, the logfile tells me that according to server_x a replace brick op has already been started
16:02 madphoenix dmachil: so you're saying any time a new file is written, it's hash is compared to existing hashes, and if it exists already the write just returns without doing anything?  if so i can see how that makes sense
16:02 Sjoerd_ JoeJulian: but how do I find out what that operation was, and how do I stop it?
16:03 akadaedalus joined #gluster
16:08 dmachi1 madphoenix: I mean that is the reason that it'll error.  When it tries to write, the file may exist on the missing brick. …"Since that possibility can be pretty safely guessed, that's why it'll error on those files."
16:17 akadaedalus left #gluster
16:18 UnixDev JoeJulian: is there a way to set the number of workings on the heal?
16:18 UnixDev workers*
16:20 JoeJulian madphoenix: The Distribute Hash Translator creates a hash based on the filename. Each directory is tagged with a dht mask that is different on each dht subvolume (in this case, single brick). If a brick is removed from operation, a file that should exist on it won't be able to find a matching mask.
16:20 JoeJulian UnixDev: gluster volume set $vol cluster.background-self-heal-count N
16:22 UnixDev JoeJulian: ahh, nice, didn't see that in the docs
16:23 JoeJulian I'm pretty sure it's still marked NODOC in the source.
16:24 UnixDev lol, nice. I will have to look at the source more
16:24 UnixDev what do you recommend for cluster.data-self-heal- algorithm?
16:24 UnixDev docs say reset is an option, but console says only full or diff
16:25 JoeJulian Some like full, some don't. I'm still on the fence.
16:25 semiosis depends on your use case
16:25 semiosis _Bryan_ and I prefer full
16:25 semiosis even tho our use cases are very different
16:25 UnixDev use case is virtual disks
16:25 semiosis probably diff then
16:26 UnixDev whats the default value for cluster.background-self-heal-count?
16:26 semiosis someone told me 16, and while i dont doubt it, that's not authoritative info
16:27 UnixDev seems io thread usage increased for me when I upped it, so maybe it was less
16:28 Mo__ joined #gluster
16:29 JoeJulian priv->background_self_heal_count = 16;
16:30 UnixDev JoeJulian: any other params that may speedup heals?
16:31 JoeJulian ip link set eth0 mtu 9000
16:31 UnixDev one thing is certain, network and disk io is reduced with diff
16:31 UnixDev yeah, they r e at 9000
16:32 UnixDev 802.3ad bond, quad gig
16:35 hagarth joined #gluster
16:35 UnixDev JoeJulian: when using diff, does that also compress the data during transfer?
16:48 JoeJulian Ok... where was I...
16:49 JoeJulian I think it's better to use full if you expect like more than 50% of a file is likely to have changed if you have a brick outage.
16:50 JoeJulian And Sj left so I guess he gets no help. :/
16:51 JoeJulian dmachi1: By mostly I'm referring to write caches. I /think/ (and maybe jdarcy can chime in on this one) a write operation returns as long as the data's written to one disk. The remainder could possibly be in a write cache. You can control this using standard posix methods.
16:52 jdarcy The answer's actualy a bit more complicated.
16:54 jdarcy A write is returned when it's written to *all* AFR replicas (this doesn't count gsync) unless write-behind is active, in which case it might still be in the client.
16:54 usrlocalsbin joined #gluster
16:55 jdarcy If O_SYNC is enabled, or an fsync is done, then we honor those.  One curious side effect is that write-behind plus O_SYNC actually "pushes the data further" than with neither.
16:56 usrlocalsbin left #gluster
16:57 Bullardo joined #gluster
17:00 JoeJulian looks like a good topic for your next blog post. I've always been a little unclear on all that.
17:01 Bullardo joined #gluster
17:02 JoeJulian write-behind is on by default... Why could it be that sometimes there's enough of a lag between writing to one replica and the next that a human could actually catch it? There's frequently someone complaining that their write didn't show up on one brick.
17:05 tc00per joined #gluster
17:06 jdarcy It should be extremely rare.  Mostly I think a lot of networks are crappier than I'd expect.
17:07 jdarcy The frequency of split-brain errors and such due to simple loss of network connectivity never ceases to surprise me.
17:07 JoeJulian true
17:07 JoeJulian at least when I get a split brain it's because I royally screwed something up.
17:13 Ryan_Lane joined #gluster
17:13 NuxRo guys, is it possible to migrate from a distributed volume to a replicated one?
17:17 JoeJulian NuxRo: Yep, gluster volume add-brick $volume replica 2 $newbrick
17:19 UnixDev is there any way for gluster nds to support thin provisioning?
17:19 UnixDev nfs*
17:20 glusterbot New news from resolvedglusterbugs: [Bug 764678] refine invocation of external commands by gsyncd <https://bugzilla.redhat.com/show_bug.cgi?id=764678>
17:24 UnixDev is the best way to replicate a brick initially rsync?
17:26 jdarcy UnixDev: Filesystems are kind of inherently thin provisioned.  Or am I misunderstanding what you want?
17:28 JoeJulian UnixDev: Not sure about how vmware does anything. You can make sparse files via nfs. I think I heard there's some issue with sparse allocations on xfs, but that's at the filesystem level (and I could be completely wrong).
17:28 UnixDev jdarcy: I'm vmware, you can create a disk thats thin provisioned. so the file size grows as the data grows (instead of creating say a 60 gb file for a disk) . Vmware says this depends on your NFS. When using gluster as the NFS, the files are not thin. They take up full amount of space.
17:28 UnixDev JoeJulian: I've tried this on Xfs and Ext4, same issue
17:28 UnixDev its not the filesystem, its something about the gluster nfs
17:29 UnixDev I'm=in
17:29 JoeJulian UnixDev: And why do people think that rsync is a "better" way than self-heal? It's all got to move the same data over the network. I never quite understand that.
17:30 UnixDev JoeJulian: I'm asking because I see the disk writes and bandwidth usage and its not moving quickly at all
17:30 UnixDev not even 1meg of network, 4mb of writes per sec
17:30 UnixDev but there is 4 gigs of bandwidth available, something isn't right
17:31 UnixDev all I did was added a brick with replica 2 to a 1 brick volume . It should be copying data very quickly, it isnt
17:31 semiosis ~pasteinfo | UnixDev
17:31 glusterbot UnixDev: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
17:32 JoeJulian I'm going to go get my espresso. brb...
17:33 UnixDev mmmmm… espresso
17:34 UnixDev http://fpaste.org/QvDF/
17:35 UnixDev http://dpaste.org/wwQrc/
17:35 glusterbot Title: dpaste.de: Snippet #211865 (at dpaste.org)
17:35 semiosis UnixDev: i'm getting 500 errors at that url
17:35 UnixDev try the second one
17:35 UnixDev gpaste is broke
17:35 semiosis ha
17:35 semiosis didnt see that
17:36 jdarcy UnixDev: Whether to create sparse files or fully populated ones (XFS preallocation bugs notwithstanding) is pretty much up to the thing creating a file.
17:36 semiosis UnixDev: did you try reducing the background self heal count?  i've found 2-4 works pretty good
17:37 jdarcy UnixDev: Here's a quick test you can do: create an empty file, seek to 1GB, write one byte, then check the actual space usage.  If that only used one block, but VMware files take up more than they should, the problem is VMware's.
17:37 UnixDev jdarcy: with other NFS servers, it works as intended
17:38 jdarcy For all I know, VMware might be relying on specific non-standard interfaces to create sparse files instead of doing things the standard UNIX way.  NAS vendors might well implement those same interfaces, but that doesn't make them standard.
17:38 UnixDev :( I will see if I can get any more info
17:38 jdarcy You can use ftruncate to extend a file, you can use fallocate, you can write one block at the end of a file.  Those all can and do result in sparse files with GlusterFS.
17:48 s0| joined #gluster
17:52 mtanner joined #gluster
17:58 Technicool joined #gluster
18:05 johnmark heya
18:06 johnmark troubleshooting session in #gluster-meeting in a couple of minutes
18:08 johnmark invites going out
18:14 johnmark if you want to join the community office hours, comes to #gluster-meeting and see the video broadcast live here - http://www.youtube.com/embed/cB9UkyjIai4
18:14 glusterbot Title: Gluster Community Hangout - YouTube (at www.youtube.com)
18:14 NuxRo JoeJulian: thanks for that. What if my distributed non-replicated setup has 2 bricks? gluster volume add-brick $volume replica 2 $newbrick1 $newbrick2 ?
18:14 johnmark sorry, that's http://youtu.be/cB9UkyjIai4
18:14 glusterbot Title: Gluster Community Hangout - YouTube (at youtu.be)
18:15 edoceo joined #gluster
18:22 JoeJulian NuxRo: Yes
18:39 NuxRo JoeJulian: thanks
18:40 edoceo can I put a distribute volume over a replicate volume to get both features?
18:41 semiosis edoceo: you can create a distributed-replicated volume by specifying a "replica N" parameter then giving MxN bricks where M is the number of replica sets to distribute over
18:42 JoeJulian edoceo: Yes, in fact that's the default behavior when you add bricks in multples of your replica setting.
18:42 semiosis edoceo: and welcome back!
18:44 mtanner joined #gluster
19:06 jbrooks joined #gluster
19:11 edoceo semiosis: thanks!
19:11 semiosis yw
19:11 edoceo Does distribute work in round-robin?  How does that play out - in the 3.0 series....
19:12 semiosis hashing
19:12 edoceo It seems like when I have distribute on top of replicate that some stuff is written to replica1 while other stuff is on replica3
19:12 semiosis @split-brain
19:12 glusterbot semiosis: (#1) learn how to cause split-brain here: http://goo.gl/nywzC, or (#2) To heal split-brain in 3.3, see http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ .
19:12 UnixDev would it be safe to share a gluster fuse mount via iscsi?
19:13 semiosis UnixDev: iscsi is block level (SAN-ish) glusterfs is file level (NAS-ish) so i dont even understand the question
19:15 UnixDev semiosis: you can share a file as an iscsi target on linux… if that file was on glisters mounted via fuse
19:15 semiosis ah ok
19:15 UnixDev semiosis: the idea is there are several gluster mounting via fuse and sharing via iscsi
19:15 UnixDev for multi-pathing
19:15 UnixDev does that sound feasible or just a split-brain waiting to happen?
19:16 semiosis i'm lost again :)
19:16 UnixDev lol
19:16 UnixDev so you have this file on gluster, exported via iscsi on 1.1.1.1
19:17 UnixDev 1.1.1.2 also mounts the gluster vol.. .and exports via iscsi as well
19:17 UnixDev the same file
19:17 UnixDev so you have your hypervisor (vmware in my case) use a iscsi target with multiple ips for redundancy
19:17 semiosis then what happens?
19:17 semiosis so, one client
19:17 NuxRo semiosis: re the split-brain causes article, if that's all I need to avoid then it looks like an easy job
19:17 UnixDev well, many clients
19:17 semiosis in the high level sense
19:18 semiosis UnixDev: i mean, one process writing to that file, even tho yes many glustefs clients
19:18 semiosis s/process/vm/
19:18 glusterbot What semiosis meant to say was: UnixDev: i mean, one vm writing to that file, even tho yes many glustefs clients
19:18 UnixDev but what if two tried to write to it?
19:18 UnixDev or one cluster node failed, so it tried to write on the other node
19:19 UnixDev then when it comes back, can it be split-brain?
19:19 semiosis NuxRo: just think very carefully about your architecture, consider if it's possible either of those scenarios could occur in some way, then test test test for failures
19:20 semiosis UnixDev: hard to say for sure but split brain seems possible yes
19:21 semiosis UnixDev: with careful testing though you can probably catch where it happens & iterate your architecture to avoid the split brain
19:22 UnixDev semiosis: I have tried a few ways, and split brains seem to be possible in every one
19:22 UnixDev :(
19:22 semiosis UnixDev: were we talking about a split brain due to VIP moving before replication was in sync?  or was that someone else?
19:22 UnixDev yeah that was me
19:23 UnixDev semiosis: native ifs is also having the thin-provisioning problem
19:23 UnixDev what about sharing fuse mount through native nds
19:23 UnixDev sorry, standard ifs daemon
19:23 UnixDev not xlator
19:23 semiosis well, did you try delaying moving the VIP over until the gluster-nfs server was talking to all bricks?  seemed like a good solution to the split-brain imho
19:24 UnixDev semiosis: I was experimenting with different timings
19:24 UnixDev still doing that, but also looking at other options
19:24 UnixDev nfs has these drawbacks also, no thin provisioning in esxi and no multi pathing
19:24 semiosis also you can still have high availability without automatic recovery... sometimes manual recovery is a good thing
19:25 semiosis automatic failover, manually switch back
19:25 UnixDev yes, I was thinking disabling auto failback
19:27 semiosis if it was easy to make an automatically self repairing or expanding storage cluster was easy, everyone would be doing it :)
19:27 semiosis these are two really hard administrative problems people often want easy solutions for: automatic cluster repair and automatic cluster scaling
19:30 johnmark semiosis: +1
19:30 UnixDev semiosis: would be nice to have the thin provisioning though :P
19:31 semiosis thin provisioning?
19:31 UnixDev semiosis: for some reason vmware is not thin-provisioning on xlator ifs :(
19:31 UnixDev nds*
19:31 UnixDev nfs***
19:31 Fabiom joined #gluster
19:31 semiosis UnixDev: which one of us had more beer at lunch?
19:31 semiosis hehehe
19:31 semiosis #fridays
19:31 UnixDev its this fucking autocomplete, its annoying
19:31 UnixDev if i keep tying, it replaces the word
19:32 semiosis bummer
19:34 JoeJulian file a bug
19:34 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
19:35 JoeJulian @hack
19:35 glusterbot JoeJulian: The Development Work Flow is at http://www.gluster.org/community/docume​ntation/index.php/Development_Work_Flow
19:37 UnixDev ok, perhaps this is a bug… i create a 500gb sparse file with 1 byte written. ls shows 500 gig size (seems odd) . when i try to copy the file to file2, it should be quick, only 1 byte written. but it seems its trying to copy 500 gb
19:37 JoeJulian are you doing cp --sparse=always?
19:38 UnixDev nope, is that required ?
19:38 JoeJulian That's the way I always have done it.
19:39 UnixDev JoeJulian: still taking a shit
19:39 UnixDev even with that
19:39 JoeJulian hmm
19:39 * JoeJulian looks to see if jdarcy's wfh day has turned into taking the afternoon off from home day.
19:40 semiosis jdarcy: three words: glupy kio slave
19:41 semiosis :O
20:05 * jdarcy turned it into a "getting s*** done at work" day.  :-P
20:06 jdarcy I'm very close to having a new tool (plus AFR changes) to resolve split brain through the front end instead of logging into the servers.
20:09 saz_ joined #gluster
20:11 jdarcy ...and it works!
20:12 jdarcy [root@gfs-i8c-04 ~]# cat /import/vms/split
20:12 jdarcy cat: /import/vms/split: Input/output error
20:12 jdarcy [root@gfs-i8c-04 ~]# setfattr -n user.bless-file -v 0 /import/vms/split
20:15 jdarcy Unfortunately, while developing this I noticed a bug in how we handle split-brain already.
20:22 UnixDev jdarcy: your tool sounds magical… :D
20:24 jdarcy Yeah, but all magic comes with a price.  ;)
20:25 jdarcy As I'm sure you can imagine, any tool that works by deleting N-1 replicas of a file (as this does) could end up losing data if the remaining one isn't correct.
20:25 jdarcy It's better than mucking about on the bricks manually, but still a pretty sharp knife.
20:26 pdurbin jdarcy: i told you you're my go-to NFS guy... i feel compelled to link you to this blurb about nfswatch i just wrote: http://serverfault.com/questions/38756/analyzi​ng-linux-nfs-server-performance/442827#442827
20:26 glusterbot Title: Analyzing Linux NFS server performance - Server Fault (at serverfault.com)
20:30 rubbs joined #gluster
20:31 jdarcy pdurbin: That's pretty cool.
20:33 rubbs I'm about to get my hands on 10 nodes to play with for Gluster, so I have a question if I use 3.3, is it feasible to set up a replicated VM disk image storage? if so do I want distributed? Striped? both? along with the replication?
20:33 pdurbin jdarcy: yeah. would love a "batch" mode like `top` has
20:34 jdarcy rubbs: In a nutshell, you'll get better data protection if you use replication, but you'll take a performance hit, and stripe isn't worth it.
20:35 Technicool joined #gluster
20:36 jdarcy http://review.gluster.org/#change,4132
20:36 glusterbot Title: Gerrit Code Review (at review.gluster.org)
20:36 rubbs jdarcy: ok, that makes sense, would distribution help? I'm guessing no in this case.
20:37 rubbs I doubt I could sell the management on a non-replicated drive.
20:37 rubbs er.. volume not drive
20:37 jdarcy rubbs: Distribution is always there, so no need to worry about it.
20:37 chouchins joined #gluster
20:38 rubbs jdarcy: ah. I think I get it now. Thanks. I'll read through the admin guide again to make sure I'm groking this all right. I may be back with more questions
20:38 semiosis a performance hit vs. data could vanish at any time sounds like a good trade to me
20:38 rubbs thanks for your help.
20:38 jdarcy Any time.
20:38 semiosis replication FTW
20:38 rubbs semiosis: depends on the application, but yeah, in most cases I would agree
20:38 jdarcy I really really really need to get going on PonyReplication.
20:38 semiosis ,,(ponies) :D
20:38 glusterbot http://hekafs.org/index.php/​2011/10/all-that-and-a-pony/
20:38 rubbs PonyReplication?
20:39 pdurbin @lucky PonyReplication
20:39 glusterbot pdurbin: http://www.gluster.org/2012/06​/never-trust-anyone-over-3-3/
20:39 jdarcy http://hekafs.org/index.php/​2011/10/all-that-and-a-pony/
20:39 glusterbot Title: HekaFS » All That . . . And a Pony! (at hekafs.org)
20:39 pdurbin jdarcy: you give talks at http://www.bblisa.org !
20:39 glusterbot Title: BBLISA - Back Bay Large Installation System Administration (at www.bblisa.org)
20:40 jdarcy I gave one, yeah.  Adam and I used to work together.
20:40 pdurbin jdarcy: i've never been. is it a good group?
20:40 rubbs reading thanks...
20:40 jdarcy Yeah, it is.  I keep meaning to go back.
20:41 pdurbin cool, cool. thanks
20:41 rubbs also anyone know why there's a section called "create replicated volume" and "Create distrubuted replicated volume" if distributed is built in?
20:41 rubbs in the admin guide ^
20:42 jdarcy That's a really good question.  In earlier versions, you could create a replicated volume that didn't have distribute as well.  The code allows it, but in the current management code we always push the DHT (replication) translator on top no matter what.
20:42 jdarcy Sounds like the admin guide needs to be updated to reflect that.  Thanks!
20:49 rubbs jdarcy: np, glad I could help, even in a small way
20:56 rubbs ok, another newb question, Since I'm looking at replication, is the performance of gluster's replication comparable in anyway to DRDB? (I know that they operate differently, but my end-result would be the same)
20:57 Technicool rubbs, the difference is simply in the number of nodes if i understand you correctly
20:58 Technicool replicated meaning two nodes only....in this case, there would be no pairings to distribute to
20:58 rubbs well right now I know I could get DRBD running on a two node system, but I have 10 nodes coming and i'd like to not have to have 5 pairs of systems, I'd rather be able to migrate VMs to any of the 10 nodes.
20:58 Technicool the distribute-replicate ends up being anything with two or more pairings, but in the end, they are essentially one and the same if you can get over the semantics
20:59 rubbs but if DRBD's replication is that much faster, I might deal with the inconvenience
21:00 rubbs I may be thinking of all this in too newbish of a way.
21:00 semiosis +1
21:00 semiosis :)
21:01 semiosis people ususally set up a storage cluster of glusterfs servers which are then accessed by client machines with the real workload
21:02 semiosis i suppose hadoop is directly opposed to that approach, but besides for that particular use-case i think separating storage from compute/web/whatever good
21:04 rubbs I don't really have the hardware to have a storage cluster that is seperate from a VM workload cluster. And while I"m trying to push us to a more asbractive model of computing, we still have a lot of legacy stuff that requires "whole system" approaches.
21:04 rubbs This was going to be the first of many projects to try to get us to seperate our storage from our computing.
21:05 rubbs but I've got a long way to go with that. (and lots of 'fights' with devs and management ;) )
21:05 semiosis heh
21:06 JoeJulian johnmark, Technicool: did you ever figure out your rdns thing?
21:10 hattenator joined #gluster
21:11 s0| I am new to glusterFS, I am reading the 3.3 doc and I see the note that says distributed striped replicated volumes only support map Reduce at this time.   does that mean I can't use it as a traditional file store as well as a hadoop store >
21:12 s0| the exact quote that confuses me is " In this release, configuration of this volume type is supported only for Map Reduce workloads."
21:12 JoeJulian distributed striped replicated? Sheesh....
21:12 s0| that is what the guide calls it.
21:13 s0| 5.6 - Creating Distributed Striped Replicated Volumes
21:13 JoeJulian Oh, it's doable, just seems kind-of overkill to me.
21:13 JoeJulian However... if what you're asking is whether you can use that as traditional storage, yes.
21:14 JoeJulian ~stripe | s0|
21:14 glusterbot s0|: Please see http://joejulian.name/blog/sho​uld-i-use-stripe-on-glusterfs/ about stripe volumes.
21:15 avati_ joined #gluster
21:17 s0| I get what you are saying but this isn't a traditional file server or something, more like a hadoop cluster that might also sometimes just act like a bit of file server from time to time.
21:18 JoeJulian I don't see any reason that should be a problem.
21:18 s0| I did a replicate 3 before, without a stripe but either I horked the hadoop install or I also need to stripe for it to support an MR job.
21:19 JoeJulian You can also have more than one volume on a server. Doing that you could have a volume defined for differing workloads.
21:21 lh joined #gluster
21:21 glusterbot New news from resolvedglusterbugs: [Bug 830121] Nfs mount doesn't report "I/O Error" when there is GFID mismatch for a file <https://bugzilla.redhat.com/show_bug.cgi?id=830121>
21:22 s0| how would someone access the same thing (either via hadoop or the filesystem) if they were on different volumes ?
21:22 JoeJulian I don't know much about hadoop, but I thought each record was treated as a single-file object. If that's true, it seems pretty unlikely to me that you would ever get past the first stripe?
21:23 JoeJulian s0|: Just mount up the volume using fuse or nfs.
21:23 sazified joined #gluster
21:29 s0| well either the 3.3.0 admin guide is wrong or I am just still confused... thanks, off to just go test it and see what I can get
22:06 johnmark s0|: that doc is referring to recommended usage. that's a brand new feature for 3.3
22:07 johnmark s0|: it's not saying you can't, it's saying maybe you should think twice before you do
22:08 s0| it doen't read that way to me. "In this release, configuration of this volume type is supported only for Map Reduce workloads." doesn't indicate "don't use this for MR" .... it seems indicate "only use this layout for MR"
22:26 johnmark s0|: knowing who writes that, I'm pretty sure I know what they meant, but you're welcome to try it and see
22:27 s0| not trying to say you are wrong or anything like that, just saying why I am feeling confused.
22:34 Ryan_Lane left #gluster
22:36 andreask joined #gluster
22:57 edward1 joined #gluster
23:50 tc00per left #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary