Camelia, the Perl 6 bug

IRC log for #gluster, 2013-08-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:07 JoeJulian Wow, after I recover from this I'm going to have to see if I can repro it...
00:21 jporterfield joined #gluster
00:21 johnmark JoeJulian: doh
00:23 JoeJulian johnmark: yeah, pretty bad... I suspect hagarth's upgrade directions don't work for replica 3
00:24 JoeJulian or... hmm...
00:25 JoeJulian nope, delete the hmm....
00:45 _pol joined #gluster
00:54 bfoster joined #gluster
00:54 portante joined #gluster
00:54 kkeithley joined #gluster
00:55 bdperkin joined #gluster
00:58 msvbhat joined #gluster
00:58 asias joined #gluster
01:03 DV joined #gluster
01:05 JoeJulian aha... Guess I can't entirely point my finger at gluster on this one... Apparently I had a dead xfs filesystem and didn't know it (damned nrpe was looking in the wrong place).
01:11 an joined #gluster
01:29 harish joined #gluster
01:33 robo joined #gluster
01:43 _pol joined #gluster
01:50 harish joined #gluster
02:05 ninkotech joined #gluster
02:16 bala joined #gluster
02:22 jporterfield joined #gluster
02:27 bharata joined #gluster
02:27 bdperkin joined #gluster
02:37 sprachgenerator joined #gluster
03:03 furkaboo joined #gluster
03:09 awheele__ joined #gluster
03:13 bulde joined #gluster
03:20 awheeler joined #gluster
03:23 shubhendu joined #gluster
03:28 bala joined #gluster
03:32 hagarth joined #gluster
03:41 itisravi joined #gluster
03:49 an joined #gluster
04:01 psharma joined #gluster
04:02 sgowda joined #gluster
04:13 ababu joined #gluster
04:21 shylesh joined #gluster
04:25 ndarshan joined #gluster
04:25 CheRi joined #gluster
04:30 saurabh joined #gluster
04:31 jporterfield joined #gluster
04:35 hagarth joined #gluster
04:42 psharma joined #gluster
04:50 aravindavk joined #gluster
04:50 ppai joined #gluster
04:53 ansemz joined #gluster
04:54 ansemz hello, every body. I am newbie.
04:59 mohankumar joined #gluster
05:03 ansemz1 joined #gluster
05:04 ansemz1 left #gluster
05:04 ansemz1 joined #gluster
05:09 rjoseph joined #gluster
05:16 nshaikh joined #gluster
05:17 rastar joined #gluster
05:18 annsem joined #gluster
05:18 annsem left #gluster
05:19 RameshN joined #gluster
05:23 shruti joined #gluster
05:26 ansemz joined #gluster
05:27 satheesh1 joined #gluster
05:28 ansemz hello, every one.
05:29 ansemz glusterfs client hunged when ls
05:29 ansemz no one talking?
05:32 raghu joined #gluster
05:32 psharma joined #gluster
05:37 vimal joined #gluster
05:40 jporterfield joined #gluster
05:48 basic` does glusterfs 3.4 use a different / new set of ports?
05:48 basic` i have a new range of ports i had to open up to get it working
05:48 basic` $IPTABLES -A IN_TCP -p tcp --dport 49152:49192 -j ACCEPT
05:49 bulde joined #gluster
05:51 nightwalk joined #gluster
05:59 lalatenduM joined #gluster
06:01 lalatenduM joined #gluster
06:01 rgustafs joined #gluster
06:05 vpshastry1 joined #gluster
06:07 jporterfield joined #gluster
06:07 ricky-ticky joined #gluster
06:10 harish joined #gluster
06:14 nightwalk joined #gluster
06:17 jporterfield joined #gluster
06:21 jtux joined #gluster
06:23 nightwalk joined #gluster
06:32 an joined #gluster
06:41 bstr joined #gluster
06:46 kanagaraj joined #gluster
06:51 vshankar joined #gluster
07:00 jporterfield joined #gluster
07:03 satheesh1 joined #gluster
07:15 jporterfield joined #gluster
07:16 guigui1 joined #gluster
07:17 fidevo joined #gluster
07:18 ngoswami joined #gluster
07:18 andreask joined #gluster
07:19 zetheroo joined #gluster
07:19 eseyman joined #gluster
07:19 JoeJulian @ports
07:19 glusterbot JoeJulian: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
07:23 mgebbe joined #gluster
07:26 zetheroo can glusterfs work with more than replica2 ?
07:26 zetheroo I see that version 3.4 is out now ... wondering if that brought more capability in this regard ...
07:26 ndevos hmm, I think nfs (not mountd etc) went to 2049/tcp in 3.4
07:27 ndevos zetheroo: it can (and could before), but it does not get tested that much, so there could be issues
07:28 ndevos zetheroo: if you test and hit problems, file a bug to get 3+ replica's improved
07:28 glusterbot http://goo.gl/UUuCq
07:30 zetheroo maybe I don't understand this correctly, but isn't gluster pretty limiting in that way in comparison to something like ceph?
07:31 zetheroo I could be understand this totally wrong ...
07:31 zetheroo :P
07:31 ujjain joined #gluster
07:39 basic` JoeJulian: so… 49152 and above needs to be open?
07:41 zetheroo for instance, with glusterfs, is there any way to replicate over 4 nodes? (without being in a testing phase)
07:44 mooperd joined #gluster
07:51 JoeJulian zetheroo: yes, "replica 4"
07:51 JoeJulian That's why there's a number in that command.
07:51 zetheroo JoeJulian: but this is not "tried and proven" right!?
07:51 JoeJulian is anything?
07:51 zetheroo well replica 2 is ..
07:52 JoeJulian Ok
07:52 JoeJulian anyway...
07:52 JoeJulian I've been running replica 3 for years.
07:52 JoeJulian @replica
07:52 glusterbot JoeJulian: Please see http://goo.gl/B8xEB for replication guidelines.
07:53 JoeJulian Crap... 3.4.0 on my ancient fedora box crashes. :(
07:58 basic` i'm pretty sure we ran replica 4+ on ec2 at a past job
07:58 basic` on 3.2
07:58 JoeJulian replica 4 just seems like overkill to me.
07:59 basic` JoeJulian: so i have 3.4 deployed on my test infrastructure now… are there any special mount options i can pass for better performance, or gluster volume options i should be setting?
08:00 basic` JoeJulian: http://pastebin.osuosl.org/3043/ that is the current mount options + volume options
08:00 glusterbot Title: #3043 OSL Pastebin (at pastebin.osuosl.org)
08:01 zetheroo what about HA and glusterfs ... since we use glusterfs to run our VM's from we would like to have the VM's automatically "migrate" to another server if the one they are running on goes down ... is this possible with glusterfs?
08:01 basic` zetheroo: i just used that feature to upgrade from 3.3 to 3.4 :)
08:01 JoeJulian sure. You don't need multiple copies to do that, just access to them.
08:03 zetheroo basic`: are you using Proxmox?
08:04 basic` proxmox? no… should i be?
08:04 JoeJulian basic`: looks like there is a mount option that defaults to off: use-readdirp
08:05 basic` JoeJulian: ooh, where did you find that?
08:05 JoeJulian /sbin/mount.glusterfs
08:05 zetheroo basic`: what are you using to get the VM's to automatically migrate?
08:05 JoeJulian and /usr/sbin/glusterfs --help
08:05 * JoeJulian uses puppet and openstack
08:06 basic` zetheroo: oh the actual migration… we use ganeti or openstack depending on the cluster for that.  I thought you meant if one of the bricks went down
08:07 zetheroo ok
08:07 zetheroo well if a host goes down then the brick on that host will also have gone down ... :)
08:07 basic` JoeJulian: thanks, that's the help i was looking for… oh and the man pages are included in the community repo packages.. great :)
08:07 JoeJulian whew.... turns out el5 has a newer libaio than Fedora 6. That's why the client was crashing on me.
08:07 JoeJulian oh, right... man pages... ;)
08:08 JoeJulian bah, nope. I was wrong.
08:08 JoeJulian again...
08:08 aravindavk joined #gluster
08:09 satheesh joined #gluster
08:09 sgowda joined #gluster
08:10 zetheroo did you guys chose not to use Proxmox? - if so why?
08:14 _ilbot joined #gluster
08:14 Topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
08:14 cfeller joined #gluster
08:14 basic` hmm… so i added 'use-readdirp=yes' to fstab and remounted but it isn't showing up in the mount options when i look at the 'mount' output
08:14 JoeJulian check ps
08:16 basic` ahh
08:16 basic` JoeJulian: thanks much :)
08:17 basic` , /usr/sbin/glusterfs --use-readdirp=yes
08:17 * JoeJulian grumbles about el5
08:17 basic` :( it can be a pain with the older packages
08:19 zetheroo shouldn't the amount of "Disk Space Free" be identical ?  http://paste.ubuntu.com/6009479/
08:19 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:19 JonathanD joined #gluster
08:22 JoeJulian pasteinfo | zetheroo
08:22 JoeJulian @pasteinfo | zetheroo
08:22 JoeJulian ~pasteinfo | zetheroo
08:22 glusterbot zetheroo: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
08:22 JoeJulian too tired...
08:25 JoeJulian for me... file a bug
08:25 glusterbot http://goo.gl/UUuCq
08:26 basic` JoeJulian: i'm exhausted as well… I have 3.4 now and I'm curious what kind of tweaks I can make to get things faster… I'll ping you tomorrow during the daylight if you don't mind
08:26 JoeJulian sounds good to me.
08:27 basic` readdirp seems to help a bit for ls -l
08:27 Norky joined #gluster
08:28 basic` thank you :)
08:28 zetheroo http://paste.ubuntu.com/6009505/
08:28 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:28 JoeJulian I wonder if this is an el5 thing or a 32bit thing... :/
08:30 JoeJulian I have no idea, zetheroo... maybe there's sparse files or something? 999 inodes different, too...
08:30 JoeJulian I guess I'd build trees to compare.
08:32 psharma joined #gluster
08:42 zetheroo is it normal that during the self heal process that the bricks are pretty much useless in the way that VM's cannot run off of them?
08:42 JoeJulian no
08:42 JoeJulian well, maybe... actually I have no idea... I don't run things off the bricks. You're supposed to use a client.
08:44 zetheroo last week on our replica 2 setup I found that the self heal daemon was offline and then when I brought it back online the VM's crashed, libvirt crashed and anything I tried to do in the path of the bricks crashed ...
08:44 zetheroo there was a 1.2TB difference of data from one brick to the next
08:44 zetheroo oh? what do you mean by "I don't run things off the bricks. You're supposed to use a client."?
08:45 JoeJulian Oh yeah, I remember that. I tried to have you do some diagnostics, but one of us ran out of time...
08:45 zetheroo indeed ;)
08:45 JoeJulian The brick is only to be used for glusterfs storage.
08:45 JoeJulian @glossary
08:45 glusterbot JoeJulian: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
08:46 zetheroo so it's not a good idea to have the VM's running from the brick?
08:46 JoeJulian no
08:46 zetheroo hmm ... now I am confused .. :P
08:46 JoeJulian bricks -> clustered filesystem -> client -> useful things
08:46 Norky note that the client can be running on the same machine as the server
08:47 JoeJulian True.
08:47 jtux joined #gluster
08:48 JoeJulian That would be like having ext4 on a partition, but allowing some application to just write to blocks on the disk. It's not going to work right and will more than likely eff things up.
08:48 Norky so VM host -> client -> clustered filesystem -> bricks   can all be within one machine
08:48 zetheroo hmm ... so this is what df -h looks like on one of our two servers in a replica 2 http://paste.ubuntu.com/6009548/
08:48 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:48 Norky note that with libgfapi, qemu-kvm IS the client
08:48 JoeJulian +1
08:49 zetheroo we have all our VM images located at /mnt/gluster/images/
08:49 JoeJulian if you've been writing to bricks, I expect your replicas to be out of sync.
08:49 mooperd joined #gluster
08:49 zetheroo which is where they are all running from ...
08:49 zetheroo is this wrong?
08:50 Norky it's not a good thing.
08:51 JoeJulian or you're just using the wrong terms and confusing us. Always a possibility.
08:51 zetheroo so glusterfs is just for storage replication?
08:51 Norky and distribution
08:51 zetheroo you cannot have live VM images working right off the gluster mount?
08:51 Norky and anything you can think of with the plugins (wrong word)
08:52 Norky JoeJulian, what's the gluster term for the modular components?
08:52 zetheroo Ubuntu 12.04 server ...
08:52 JoeJulian translators
08:52 Norky "VM images working right off the gluster mount"- by "gluster mount", do you mean brick?
08:53 zetheroo now I am not sure what I mean ... LOL
08:53 JoeJulian Ok, looking back at your info... saturn:/mnt/brick and mars:/mnt/brick are your bricks. Don't touch those.
08:53 zetheroo this is what df -h looks like http://paste.ubuntu.com/6009548/
08:53 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:53 zetheroo and our VM images are on /mnt/gluster/images/
08:54 Norky what does "gluster vol info gluster" tell you?
08:54 zetheroo the name of our gluster is simply "gluster"
08:54 Norky possibly a confusing name for your volume :)
08:54 zetheroo http://paste.ubuntu.com/6009563/
08:54 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:55 zetheroo Norky, agreed
08:55 JoeJulian Ok, then. You're accessing your volume not your brick. That's the correct way.
08:55 zetheroo oh ok - :P
08:55 JoeJulian At least you didn't call it a node. I'd have kicked you by now... ;)
08:56 zetheroo oops ... ok
08:56 zetheroo I would have maybe called one of the servers a node ...
08:56 zetheroo heh
08:56 Norky saturn:/mnt/brick and mars:/mnt/brick are bricks. Do not access them directly
08:56 JoeJulian Might as well call it a smurf.
08:56 ninkotech_ joined #gluster
08:56 edoceo joined #gluster
08:56 zetheroo Norky: right ... I only access /mnt/gluster/
08:57 Norky cool, then it sounds like you are doing things correctly
08:57 JoeJulian I wonder what happens if I set "use-readdirp" on a system without that patch...
08:57 zetheroo any great how-to's on setting up a glusterfs HA cluster? ... stuff I am finding looks a tad old ...
08:57 zetheroo Norky: great :)
08:58 Norky I'm not sure why you're seeing a problem
08:58 Norky your problem is that the two bricks appear to be out of sync, correct?
08:58 atrius joined #gluster
08:59 zetheroo Norky: I guess ... there is more free space on one than on the other
08:59 JoeJulian me neither. I've got all my VMs running on a mounted volume and I haven't had a problem since 3.3.0.
08:59 Norky you can access bricks by reading them, e.g. for comparison, btw, jsut not writing
08:59 JoeJulian oh, that one too
09:01 JoeJulian My expectation is that whenever we tell you a hard and fast rule, once you understand why you're breaking that rule go right ahead. :D
09:01 nightwalk joined #gluster
09:01 Norky so "df /mnt/brick" on mars returns something very different from  "5.5T  1.1T  4.5T  19% "?
09:04 zetheroo http://paste.ubuntu.com/6009582/
09:04 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:09 DV joined #gluster
09:09 Norky what does "gluster volume heal gluster info" tell you?
09:09 jporterfield joined #gluster
09:11 zetheroo http://paste.ubuntu.com/6009605/
09:11 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:14 Norky 19 presumably large files that are not up to date on mars, that explains the discrepancy
09:14 Norky do both machines host VMs?
09:14 zetheroo yes
09:14 zetheroo but each Vm is only running on one machine
09:15 thomasrt joined #gluster
09:15 mooperd joined #gluster
09:15 Norky I was wondering why the problem (files in need of healing) was only on one machine
09:18 zetheroo only one vm is actually running on mars, and it's name is solforecast
09:22 rnts left #gluster
09:22 Norky is saturn, or rather, any of the VMs on it, especially busy?
09:22 zetheroo right now everything on the gluster is crawling ..
09:22 zetheroo there are 12 VM's running on saturn atm
09:27 Norky 12 vms on one host, and 1 on another seems a little... unbalanced
09:27 Norky it might just be that you're overloading it
09:28 Norky gluster is trying to heal those out of sync files, but with 12 vs 1 the majority of the traffic will be one-way
09:28 zetheroo 12 VM's ... but only 3 or 4 are actually busy ...
09:28 zetheroo hmm ok
09:28 zetheroo so gluster is automatically trying to heal the inconsistencies?
09:29 Norky I believe so
09:30 zetheroo is it opossible to pause the heal process?
09:30 Norky iotop shoudl show that glustershd is consuming a significant %age of the available I/O
09:30 JoeJulian shouldn't often be out of sync though. The fuse client connects to both simultaneously.
09:31 Norky shouldn't be, but I'd guess the machine is thrashing
09:32 Norky what VM host s/w are you using? KVM/QEMU?
09:32 zetheroo kvm
09:33 zetheroo the gluster is absolutely crawling
09:33 duerF joined #gluster
09:33 zetheroo what would have caused the self heal daemon to go offline?
09:34 Norky in 'normal' operation, the SHD doesn't need to do anything
09:34 JoeJulian better question would be, what caused a disconnection between the client and the other server?
09:34 zetheroo sure, but I find it one fine day offline ...
09:34 zetheroo we had a power outage ... unscheduled ...
09:34 zetheroo but that was over two months ago
09:35 zetheroo so since then things have been getting out of sync
09:35 Norky the client writes to both servers simultaneously (and waits for a sync, correct, JoeJulian?)
09:35 ndarshan joined #gluster
09:35 JoeJulian Depends...
09:35 JoeJulian I think that qemu does fsync
09:35 Norky it's only if something like that outage happens, and things get out of sync, that the SHD needs to get involved - at least that's my simplified understanding
09:36 zetheroo did this command initiate the heal process? " gluster volume heal gluster info"
09:36 Norky no
09:36 zetheroo because it's only since then that everything is crawling
09:36 JoeJulian healing is a good thing. You do want to figure out how to let it happen.
09:37 mmalesa joined #gluster
09:37 zetheroo could the crawling have been caused by me changing the network ping timeout from 3 to 42 ?
09:37 JoeJulian no
09:37 JoeJulian how big are your images?
09:38 vpshastry joined #gluster
09:39 zetheroo this big: http://paste.ubuntu.com/6009644/
09:39 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:40 zetheroo ok I can do ls on saturn:/mnt/gluster/images/ ... but not in mars:/mnt/gluster/images/ ....
09:40 JoeJulian Ok, so you need to crawl through 765G and sync the changes to the other replica.
09:40 JoeJulian check the client log
09:40 zetheroo mars being the client?
09:41 zetheroo since I am mounting the gluster from saturn ...
09:41 zetheroo on mars df -h shows: http://paste.ubuntu.com/6009647/
09:41 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:43 JoeJulian When something done through a client mount doesn't do what's expected, you check the client log for that client.
09:43 zetheroo cli.log from saturn http://paste.ubuntu.com/6009653/
09:43 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:43 JoeJulian So you can't ls on mars. check mars' log.
09:43 JoeJulian client log is named for the mountpoint.
09:43 zetheroo cli.log on mars is empty
09:44 JoeJulian /var/log/glusterfs/mnt-foo-bar.log
09:44 zetheroo mnt-gluster.log
09:44 zetheroo ?
09:44 Norky yes
09:44 ndarshan joined #gluster
09:45 zetheroo on mars I have  mnt-gluster-.log and mnt-gluster.log .... both are empty
09:45 zetheroo but I also have mnt-gluster.log.1 and mnt-gluster-.log.1
09:46 Norky mnt-gluster.log-earlierdate?
09:46 JoeJulian is your log partition full?
09:47 zetheroo last part of mnt-gluster-.log.1 http://paste.ubuntu.com/6009660/
09:47 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:47 Norky not according to http://paste.ubuntu.com/6009647/
09:47 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:48 JoeJulian @meh
09:48 glusterbot JoeJulian: I'm not happy about it either
09:48 zetheroo last part of mnt-gluster.log.1   http://paste.ubuntu.com/6009666/
09:48 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:48 zetheroo both from mars
09:49 JoeJulian 2013-08-16 is a bit old...
09:51 zetheroo yes
09:52 zetheroo on saturn the logs look more up-to-date
09:53 zetheroo on mars iotop shows about 16 entries of glusterfsd at the top
09:54 zetheroo I'll drop the self-heal-count me thinks
09:55 JoeJulian sounds like a good experiment
09:55 zetheroo dropped to 1 and still have 16 entries at the top of iotop
09:56 JoeJulian cool
09:56 zetheroo no ... ha
09:57 zetheroo it's killing everything running on the gluster
09:58 zetheroo I am going " gluster volume set gluster cluster.background-self-heal-count 1" ... why is it not working?
09:59 JoeJulian maybe it is?
09:59 JoeJulian maybe that io is coming from something else?
09:59 zetheroo how would I know it is or isn't working?
09:59 JoeJulian The empty log shouldn't be empty.
09:59 JoeJulian they're never empty.
09:59 zetheroo shouldn't there be 1 glusterfsd entry in iotop, and not 16 ?
10:00 JoeJulian Not necessarily. iirc there's 16 io-threads...
10:01 zetheroo ok, so 16 entries in iotop is not representing those 16 threads?
10:01 JoeJulian io threads, not necessarily self-heal threads.
10:02 JoeJulian You have nearly that many vms so it doesn't seem unreasonable from my perspective.
10:03 JoeJulian you could check gluster top
10:03 JoeJulian that might give you another clue
10:03 JoeJulian and I still come back to that missing log data....
10:04 JoeJulian I know when there's a problem writing to a log, things do slow down a lot.
10:06 zetheroo ok, I found out where the high IO is coming from
10:06 zetheroo there is a backup of a large VM image occurring :P
10:06 JoeJulian heh
10:06 zetheroo it runs from a crontab job ...
10:07 zetheroo started yesterday at 9pm ... garrrr
10:07 JoeJulian ugh
10:07 JoeJulian Oh... I don't know what time it is now, so maybe that's not that long... <shrug>
10:08 zetheroo it's 12:07pm the next day ;)
10:08 zetheroo so it's been going for about 13 hours!!
10:08 zetheroo LOL
10:08 zetheroo how does one find and stop such a process?
10:09 JoeJulian lsof?
10:10 zetheroo the cronjob is simply to run a script ...
10:11 zetheroo wow, lsof spits out a huge amount of stuff ...
10:11 Norky you'll want to restrict it to the file in which you're interested
10:11 zetheroo ok
10:12 zetheroo think I found it
10:12 zetheroo backup-ra  7223            root  255r      REG              259,0          164              2359454 /var/lib/libvirt/scripts/backup-rana
10:13 zetheroo is 7223 it's PID?
10:15 zetheroo ok I killed backup-ra ... but the cp -r command is still running which was initiated by the script :P
10:16 Norky from where and to where is it copying?
10:17 zetheroo from saturn to a storage server on the network
10:17 Norky so no writing to the glusterfs volume?
10:18 zetheroo from the gluster on saturn to storage server ...
10:19 zetheroo in iotop what is the TID?
10:19 zetheroo can I use this to stop the command?
10:20 GLHMarmot joined #gluster
10:20 Norky thread id, no, I don't think so
10:21 zetheroo so there is no other option but to wait for this 190GB image to be copied over the network ... ? :(
10:22 JoeJulian kill the cp?
10:22 Norky pressing "p" in iotop will switch to a process view
10:23 zetheroo aha
10:23 zetheroo huh ... so in this case the TID and the PID were the same
10:24 zetheroo am I doing this right?
10:24 zetheroo kill PID 15206
10:24 Norky kill 15206
10:24 zetheroo ah
10:24 zetheroo doh
10:25 zetheroo ok it's gone!!!
10:25 zetheroo :)
10:26 Norky JoeJulian, I don't use gluster for VMs. What's the status of the qemu libgfapi support? Is it usable on a production system?
10:26 JoeJulian ready and rarin' to go...
10:27 Norky jolly good
10:27 JoeJulian I'm not using it yet either, but that's why I'm still up at 3:30am...
10:27 Norky I really must look into it sometime
10:28 Norky does it allow using existing vm images that were accessed/created using the FUSE interface in 3.3 to be used, or does one have to create new images using the 'bd' interface?
10:28 JoeJulian I'm doing the 3.4 upgrade tonight. Only a few little difficulties.
10:28 Norky I wasn't quite certain from reading the documentation
10:29 ngoswami joined #gluster
10:30 JoeJulian 32 bit el5 client crashes. I'm not sure if that's just 32 bit, or just el5. Lukily they're api compatible so I can run the 3.3.2 client on the el5 servers.
10:33 zetheroo we also have one VM on saturn which writes large amounts of data to an image on the same server but onto the OS disk ... this is not helping things either ...
10:41 jporterfield joined #gluster
10:45 vpshastry left #gluster
10:47 spider_fingers joined #gluster
10:48 vpshastry joined #gluster
10:50 fleducquede joined #gluster
11:02 jporterfield joined #gluster
11:04 rwheeler joined #gluster
11:17 hagarth joined #gluster
11:23 CheRi joined #gluster
11:24 andreask joined #gluster
11:28 kkeithley joined #gluster
11:35 ppai joined #gluster
11:43 shruti joined #gluster
11:55 jporterfield joined #gluster
11:59 B21956 joined #gluster
12:00 manik joined #gluster
12:09 badone joined #gluster
12:13 an joined #gluster
12:13 ngoswami joined #gluster
12:14 CheRi joined #gluster
12:20 rgustafs joined #gluster
12:23 yinyin joined #gluster
12:23 zetheroo rebooted the host (saturn) due to libvirtd going defunct and now the gluster is once again 100% useless
12:29 chirino joined #gluster
12:31 B21956 joined #gluster
12:40 mmalesa joined #gluster
12:40 rastar joined #gluster
12:49 bennyturns joined #gluster
12:49 B21956 joined #gluster
12:51 zetheroo killed the gluster and are now running the VM's fine ...
12:51 zetheroo have to really rethink this whole gluster thing .. :-/
12:51 chirino joined #gluster
12:53 hagarth joined #gluster
12:59 sprachgenerator joined #gluster
13:01 chirino joined #gluster
13:03 bulde joined #gluster
13:05 samsamm joined #gluster
13:09 vpshastry1 joined #gluster
13:16 chirino joined #gluster
13:22 robo joined #gluster
13:23 hagarth joined #gluster
13:23 eseyman joined #gluster
13:25 aliguori joined #gluster
13:25 aliguori joined #gluster
13:27 hybrid5123 joined #gluster
13:29 spider_fingers left #gluster
13:30 chirino joined #gluster
13:32 vpshastry1 joined #gluster
13:35 failshell joined #gluster
13:40 chirino joined #gluster
13:43 bugs_ joined #gluster
13:50 chirino joined #gluster
13:53 georgeh|workstat having a strange problem, got a replicated volume, one of the servers self-heal log shows a transport endpoint disconnect this morning, since then it has cancelled all crawls for a specific subvolume which it claims went down (but is still available on both servers), subsequently the self-heal daemon crashed completely 2.5 hours later and is down...what is the best approach to restoring it?
14:01 georgeh|workstat is there a way to just restart the self-heal daemon?  or do I need to stop and start glusterfs?
14:02 asias joined #gluster
14:05 manik joined #gluster
14:05 duerF joined #gluster
14:06 chirino joined #gluster
14:14 satheesh1 joined #gluster
14:16 [o__o] joined #gluster
14:20 guigui1 joined #gluster
14:25 rwheeler joined #gluster
14:26 lpabon joined #gluster
14:27 chirino joined #gluster
14:28 meghanam joined #gluster
14:28 meghanam_ joined #gluster
14:30 kanagaraj joined #gluster
14:39 satheesh1 joined #gluster
14:41 lalatenduM joined #gluster
14:44 chirino joined #gluster
14:46 lala_ joined #gluster
14:51 mmalesa_ joined #gluster
14:52 rastar joined #gluster
14:52 lalatenduM joined #gluster
14:59 chirino joined #gluster
15:03 kaptk2 joined #gluster
15:07 plarsen joined #gluster
15:07 bennyturns joined #gluster
15:12 chirino joined #gluster
15:17 vpshastry1 left #gluster
15:18 zerick joined #gluster
15:20 chirino joined #gluster
15:23 an joined #gluster
15:27 TuxedoMan joined #gluster
15:27 TuxedoMan If I have 2 servers presenting bricks -- and my client is a newer version than my servers -- will that prevent me from mounting my bricj on the client?
15:29 morse joined #gluster
15:31 zetheroo left #gluster
15:35 morsik depends on version
15:35 morsik 3.2 is incompatible with 3.3
15:36 morsik but if you have versions in the same compatibility line - then it'll mount
15:36 morsik afair 3.3 is compatible with 3.4 too
15:37 basic` yeah 3.3 can mount 3.4
15:37 LoudNoises joined #gluster
15:39 TuxedoMan Hmmm
15:39 TuxedoMan Maybe I was doing something wrong
15:39 TuxedoMan I was receiving some errors
15:39 TuxedoMan and volume info showed ok, connected
15:39 TuxedoMan showmount showed that my node1 was offering the brick...
15:39 TuxedoMan but I couldn't mount
15:40 Norky TuxedoMan, err, you shoudl be mounting the *volume*, not the brick
15:41 TuxedoMan that's what I meant
15:41 TuxedoMan sorry
15:41 Norky the volume is comprised of one or more bricks
15:41 guigui1 left #gluster
15:41 TuxedoMan ill give it a go and post some logs if I run into the same errors
15:41 Norky okay, no worries, just wanted to make sure it was a terminology problem and you're werent' actually doing the wrong thing :)
15:43 andreask joined #gluster
15:45 gkleiman joined #gluster
15:46 chirino joined #gluster
15:47 sprachgenerator joined #gluster
15:47 JonnyNomad joined #gluster
15:47 jebba joined #gluster
15:48 jag3773 joined #gluster
15:51 hamnstar joined #gluster
15:52 dmojoryder joined #gluster
15:53 hamnstar hello folks - how do I start the glusterfs NFS service if it isn't running?\
15:54 bala joined #gluster
15:57 chirino joined #gluster
16:10 vpshastry joined #gluster
16:10 semiosis hamnstar: restart glusterd
16:10 chirino joined #gluster
16:10 vpshastry left #gluster
16:13 Technicool joined #gluster
16:13 redragon_ joined #gluster
16:13 redragon_ @yum-repo
16:13 glusterbot redragon_: I do not know about 'yum-repo', but I do know about these similar topics: 'yum repo'
16:13 redragon_ @yum repo
16:13 glusterbot redragon_: The official community glusterfs packages for RHEL (including CentOS, SL, etc), Fedora 17 and earlier, and Fedora 18 arm/armhfp are available at http://goo.gl/s077x. The official community glusterfs packages for Fedora 18 and later are in the Fedora yum updates repository.
16:15 redragon_ JoeJulian, its your fault, now I'm using glusterbot as a bookmark tool
16:15 hamnstar semiosis: tried that, NFS service isnt starting with the others
16:16 semiosis hamnstar: you have at least one volume in "started" state?  check /var/log/glusterd/nfs.log
16:18 hamnstar semiosis: yup, Output of 'gluster volume status' is as so: Brick (hostname) online, NFS server offline, self-heal daemon offline
16:18 hamnstar nfs.log doesnt get any additional content between glusterd restarts
16:25 chirino joined #gluster
16:27 hamnstar I think i might have made a shitty config... this is just a test environment, going to try re-doing my gluster config and take it from there.  thanks for your help
16:28 _pol joined #gluster
16:32 JoeJulian hamnstar: "... made a shitty config..." are you making .vol files?
16:33 hamnstar JoeJulian: not manually, but I didn't follow the actual gluster install guide.  I think i might have made an error when setting up the disk partitions that gluster is using as bricks?
16:33 Mo__ joined #gluster
16:33 hamnstar not sure how, because it worked after install... but after rebooting, no NFS
16:33 chirino joined #gluster
16:33 JoeJulian what distro?
16:33 hamnstar this is on xenserver 6
16:34 TuxedoMan silly question -- when probing peers -- do I need to peer probe node 1 from node 2 and vice versa? or just node 1 peer probes node 2 and that's enough?
16:34 JoeJulian @hostnames
16:34 glusterbot JoeJulian: Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others.
16:34 JoeJulian hamnstar: iirc, that's based on RHEL6, right?
16:35 hamnstar JoeJulian: I believe so, yes... at the very least, it uses yum as a package manager :P
16:35 JoeJulian Asking because I found a problem with the client and EL5 last night.
16:35 hamnstar ah... as per version of RHEL, I don't know at the moment
16:37 TuxedoMan JoeJulian: so in a 2 node cluster -- both need to probe eachother -- in a larger environment node 1 probes all nodes -- then ANY single node needs to probe back node 1?
16:37 JoeJulian TuxedoMan: correct
16:37 TuxedoMan Thank you
16:41 basic` JoeJulian: okay..  http://pastebin.osuosl.org/3049/
16:41 glusterbot Title: #3049 OSL Pastebin (at pastebin.osuosl.org)
16:41 basic` JoeJulian: i'm seeing ~40 seconds to do a git status… help! :)
16:42 JoeJulian basic`: how many simultaneous git users do you expect to have using that volume?
16:42 JoeJulian in any one particular repo
16:42 basic` ~12
16:43 JoeJulian Well then you can't use eager locking...
16:43 Girth joined #gluster
16:43 basic` JoeJulian: actually… 1?
16:43 JoeJulian with 1 you can
16:43 basic` JoeJulian: since these are home directories
16:43 basic` i imagine only 1 access at a time
16:43 basic` with multiple copies of the same repo
16:44 Girth what would cause both self-heal and a rebalance to fail?
16:44 JoeJulian eager locking and there's one other that's not safe for multiple clients... ugh... up 'till 4:30 and still haven't had coffee...
16:46 nshaikh left #gluster
16:46 basic` JoeJulian: i'm also seeing this in the log [2013-08-21 16:37:44.360881] E [io-cache.c:557:ioc_open_cbk] 0-web-drupal-io-cache: inode context is NULL, is that okay?  It seems to happen occassionally
16:46 [o__o] joined #gluster
16:48 Girth joined #gluster
16:50 * JoeJulian shrugs
16:51 JoeJulian I'm not sure what inode context is...
16:52 basic` no problem, guessing it's not important
16:53 basic` do i need to remount for eager-lock: on to take affect?
16:53 asteriskmonkey1 joined #gluster
16:53 Girth joined #gluster
16:54 asteriskmonkey1 does gluster work likek unionfs?
16:54 chirino joined #gluster
16:54 Girth_ joined #gluster
16:55 asteriskmonkey1 can someone aid me with a better understanding of gluster?
16:56 Girth left #gluster
16:56 asteriskmonkey1 i have a wierd issue that im currently using unionfs to sort but i think gluster might be a better option
16:57 JoeJulian basic`: We did get you using readdirplus, didn't we?
16:58 basic` JoeJulian: yes you did
16:58 JoeJulian asteriskmonkey1 I suppose you could loosely say that it's like unionfs across networks.
16:59 JoeJulian but unlike unionfs, it's built for multiple clients
16:59 nightwalk joined #gluster
17:02 asteriskmonkey1 so if i wanted to hve the same video file distributed across 40 servers globally
17:02 chirino joined #gluster
17:02 asteriskmonkey1 gluster would be a nicer approach than my cronned unionfs syncs
17:02 asteriskmonkey1 :)
17:03 JoeJulian asteriskmonkey1: probably not
17:03 semiosis can geo-rep do fan out?
17:03 JoeJulian That's where I was going next... :D
17:04 asteriskmonkey1 ? or really?
17:04 asteriskmonkey1 im facing the issue where i need global sycn of files across multiple servers for local access
17:05 asteriskmonkey1 trying to find the best solution
17:09 JoeJulian As long as you only need unidirectional replication, geo-replication would do that well.
17:11 basic` real    1m21.540s with eager-lock on :/
17:12 TuxedoMan joined #gluster
17:13 JoeJulian wow
17:13 JoeJulian guess that's not an improvement...
17:14 TuxedoMan So according to gluster.org their mounting in fstab includes the _netdev (http://gluster.org/community/documentation/index.​php/Gluster_3.1:_Automatically_Mounting_Volumes)
17:14 glusterbot <http://goo.gl/klFrI4> (at gluster.org)
17:14 TuxedoMan Is there a ... benefit or a necessity? I didn't use that and my brick is mounted fine
17:15 semiosis TuxedoMan: what distro are you using?
17:15 TuxedoMan Wondering if there's some other feature or something that won't work if my fstab just has "defaults"
17:15 TuxedoMan http://gluster.org/community/documentation/index​.php/Gluster_3.1:_Automatically_Mounting_Volumes
17:15 glusterbot <http://goo.gl/klFrI4> (at gluster.org)
17:15 TuxedoMan oops
17:15 TuxedoMan gfs1:/gv0 /gluster/brick  glusterfs defaults 0 0
17:15 TuxedoMan F19 semiosis
17:15 JoeJulian for el based distros _netdev postpones the mount until the network is started.
17:16 TuxedoMan Ahh, I see.
17:17 krink joined #gluster
17:19 mmalesa joined #gluster
17:23 basic` i think for now… homedirs will stay on nfs
17:25 TuxedoMan I'm shocked glusterd doesnt have to be running on my clients in order to read the gluster brick -- there's no daemon at least
17:25 TuxedoMan I guess installing gluster-fuse does what it needs to do, eh?
17:25 dmojoryder with glusterfs 3.4 is ext4 the preferred file system now, or is it xfs (or doesn't matter)?
17:26 chirino joined #gluster
17:26 semiosis TuxedoMan: there is a daemon... a process which fuse calls to do work when you access the mount point
17:26 semiosis look at the ,,(processes) reported by ps
17:26 glusterbot The GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal). See http://goo.gl/F6jqx for more information.
17:26 TuxedoMan process is glusterfs
17:27 TuxedoMan I didn't think it was running when I do a service glusterfs status it returns as dead but my mounts up
17:27 ninkotech joined #gluster
17:27 gkleiman joined #gluster
17:28 TuxedoMan So what daemon is calling the process?
17:28 TuxedoMan brb call
17:33 RobertLaptop joined #gluster
17:33 lalatenduM joined #gluster
17:39 rcheleguini joined #gluster
17:41 chirino joined #gluster
17:49 semiosis TuxedoMan: daemon just means (to me) non-interactive process.  there's no system service for the mount, but there is a daemon
17:51 bulde joined #gluster
17:53 chirino joined #gluster
18:01 robo joined #gluster
18:03 robo joined #gluster
18:05 mmalesa joined #gluster
18:05 hagarth joined #gluster
18:06 zombiejebus joined #gluster
18:10 andreask joined #gluster
18:12 krink i'm experiencing Transport endpoint getting disconnected when extracting a large tar file.  I'm not using dns, just /etc/hosts files.  Are /etc/hosts file sufficient enough?
18:14 chirino joined #gluster
18:15 krink Also, the hosts are multi-homed.  does that make a difference?
18:16 ninkotech joined #gluster
18:17 ninkotech_ joined #gluster
18:19 lanning joined #gluster
18:20 aurigus joined #gluster
18:21 aurigus Which is a better gluster config for redundancy/expandability. 10gbe network on both. 2 x 24X2TB drive servers, or 8 x 4x2TB drive servers? Any reservations about either setup? Filesystem will be backed by zfs.
18:27 Technicool joined #gluster
18:38 pachyderm I have 8 bricks doing distributed replicate for a single volume. I keep getting "remote operation failed: No such file or directory" and "remote operation failed: Stale file handle." so I tried a self-heal and a rebalance. Both are just hanging and not actually doing anything. What could be causing this?
18:38 chirino joined #gluster
18:38 Excolo joined #gluster
18:42 Excolo hey, so i have a gluster installation working as replication with 2 servers (one at each datacenter we have). Its currently not being used by production machines, and nothing is copying into it. One of them, and I have no idea why, is using 100% CPU, and gluster is the only application running on it
18:42 Excolo the process with 100% usage is glusterfsd
18:49 chirino joined #gluster
18:59 ninkotech joined #gluster
18:59 ninkotech_ joined #gluster
19:01 johnmark channelstats
19:02 johnmark @channelstats
19:02 glusterbot johnmark: On #gluster there have been 172530 messages, containing 7305828 characters, 1220138 words, 4880 smileys, and 647 frowns; 1069 of those messages were ACTIONs. There have been 66087 joins, 2061 parts, 64036 quits, 21 kicks, 164 mode changes, and 7 topic changes. There are currently 199 users and the channel has peaked at 226 users.
19:05 robo joined #gluster
19:07 chirino joined #gluster
19:09 mmalesa joined #gluster
19:14 JoeJulian krink: Shouldn't matter. Check your client log for clues (/var/log/glusterfs/$(echo $mountpath | tr '/' '-'))
19:15 JoeJulian ~replica | aurigus
19:15 glusterbot aurigus: Please see http://goo.gl/B8xEB for replication guidelines.
19:15 ninkotech_ joined #gluster
19:15 JoeJulian pachyderm: check /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
19:15 ninkotech joined #gluster
19:16 awheeler joined #gluster
19:17 JoeJulian @later tell Excolo missed your window of patience by mere minutes. I would have asked what version of glusterfs and had you check your brick log for that brick. Say @processes in #gluster to learn what glusterfsd is.
19:17 glusterbot JoeJulian: The operation succeeded.
19:26 Recruiter joined #gluster
19:27 NeatBasis joined #gluster
19:32 TuxedoMan Do the server bricks all have to be the exact same size?
19:32 semiosis no
19:33 semiosis but it's easier to manage when they are imho
19:34 TuxedoMan If I grow the LVM bricks on the servers... how does the client see the change in size? remount?
19:34 chirino joined #gluster
19:34 neofob joined #gluster
19:35 semiosis the client always sees the sum of the minimum free of each replica set
19:37 TuxedoMan So I don't even have to remount the drive on the client?
19:38 TuxedoMan I'm about to test growing the 2 server bricks -- client side would just see the change? I figured I'd have to at least remount or something
19:39 semiosis you dont
19:39 TuxedoMan awesome
19:45 mooperd joined #gluster
19:50 chirino joined #gluster
19:52 mmalesa joined #gluster
19:54 TuxedoMan So when using replica and say node01 is servicing everything, and that node goes down, the cluster is crippled until that node comes back online?
19:55 rwheeler joined #gluster
19:55 semiosis node01 is not servicing everything (when using the FUSE client) it is only the ,,(mount server)
19:55 glusterbot The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds
19:57 TuxedoMan Right. My mount on my client is node01:/export/brick
19:57 TuxedoMan so when node01 goes down
19:57 TuxedoMan my client can't write to the mount
19:58 semiosis should only be for 42 seconds, the ,,(ping-timeout)
19:58 glusterbot The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
19:58 semiosis then the client should resume operating without that server
19:59 TuxedoMan so instead of rebooting the server ill try shutting it down... after 42 seconds what happens? the fuse client will redirect to node02?
20:00 rcheleguini joined #gluster
20:00 semiosis the fuse client is always working with all available replicas.  when a replica dies the client waits until ping-timeout for it to come back.  if it doesnt it marks the brick unavailable and resumes with the remaining replicas
20:01 semiosis writes always go to all replicas, and reads go to whichever replica replies fastest
20:01 semiosis roughly speaking
20:04 chirino joined #gluster
20:05 ninkotech_ joined #gluster
20:06 ninkotech joined #gluster
20:08 TuxedoMan joined #gluster
20:09 TuxedoMan Cool -- shutting down node01 after that 42 seconds knew to go over to node02 -- as soon as node01 back online -- it brought the data over.
20:10 jbrooks joined #gluster
20:12 tqrst left #gluster
20:13 TuxedoMan So the data that was being written during that 42 second time out -- are just "lost"?
20:18 pachyderm can you add a node the trusted server pool that doesn't have any storage and use that as the main nfs connect point?
20:21 chirino joined #gluster
20:22 TuxedoMan umm
20:23 TuxedoMan I think you need a export in order to add it to a volume (but I just started with gluster so don't trust that)
20:23 TuxedoMan I know when I'm making a volume I have to include the exports from each server, then go live with it
20:30 pachyderm I just want to stop my users from having to mount the gluster volume directly
20:33 chirino joined #gluster
20:33 TuxedoMan For what reason?
20:35 pachyderm they keep breaking it
20:36 TuxedoMan lol how
20:37 pachyderm I am getting file missing errors for no reason
20:37 jbrooks joined #gluster
20:38 pachyderm I think it is becasue they aren't properly mounting/unmounting the volume when using it
20:38 TuxedoMan interesting
20:39 TuxedoMan I wasn't aware there were certain ways of unmounting
20:39 pachyderm I have checked through the logs and I see issues with local locks failing, op_ctx modification failing, and stale file handles
20:40 pachyderm the stale file handles appear to be happening because of the way gluster throws a generic exception
20:40 pachyderm at least that is what I can ascertain from bug reports
20:43 nightwalk joined #gluster
20:45 badone joined #gluster
20:48 thomasrt @pachyderm: I think you're essentially asking if there's any reliability increase if you could add additional machines and partition the NFS access to all go through some trusted peers that aren't responsible for storage.
20:49 thomasrt the argument I'd see there is that any problem on the machine doing purely NFS woudn't mean taking down a machine that handles storage too
20:50 TuxedoMan Couldn't you just mount the volume on a server dedicated to NFS that out?
20:50 TuxedoMan Kind of daisy chaining, but might accomplish what you want
20:50 chirino joined #gluster
20:51 TuxedoMan Question -- when I create my volume and I say.... replica 2 or 3 etc. say I add some servers, do I have to re-create that volume if I want to increase the number of nodes that keep a copy of the data?
20:52 semiosis pachyderm: "can you add a node the trusted server pool that doesn't have any storage and use that as the main nfs connect point?" --> yes you can
20:52 thomasrt Protocol stacking is what I've seen that called TuxedoMan.  I've read reports with people having problems with that scenario.  WHich is why I'd wonder if replacing the NFS stack with the one gluster is managing would improve things by simplifying the overall protocol stack
20:55 TuxedoMan any word on changing the replica number on an active volume?
20:57 semiosis TuxedoMan: that is done, confusingly, using the add-brick command.  i think it's in the ,,(rtfm)
20:57 glusterbot TuxedoMan: Read the fairly-adequate manual at http://goo.gl/E3Jis
20:58 _pol joined #gluster
21:00 TuxedoMan Ty
21:00 chirino joined #gluster
21:04 thomasrt How does self-heal interact with rebalance to fix layout?  Does rebalance make any assumptions that all the files are healed first or can you rebalance with some files still showing up as failed or split-brain?
21:04 JoeJulian I always fix split-brain before I do anything else.
21:05 TuxedoMan I'm not seeing how to up the replica count on my active volume on gluster.org
21:05 thomasrt good advice @joejulian
21:05 JoeJulian add-brick replica N {new bricks...} where N is the new replication level you want.
21:05 thomasrt what about issues left in heal info?  I've got a list of gfids but no paths
21:05 JoeJulian also works with remove-brick
21:06 TuxedoMan I did stumble upon JoeJulian.name tho ;)
21:06 JoeJulian That guy's a hack...
21:06 TuxedoMan JoeJulian: What if I dont want to add a brick though? What if I just want to up the number of replica's
21:06 TuxedoMan say I have 10 nodes and currently only 2 replica's and I want to add a 3rd
21:06 TuxedoMan but keep my resource pool at 10
21:07 TuxedoMan "trusted pool"
21:07 TuxedoMan soryr
21:07 JoeJulian thomasrt: Yeah, I try to heal those too. If a standard heal after fixing any split brain doesn't fix that, a heal full should.
21:07 JoeJulian ~brick order | TuxedoMan
21:07 glusterbot TuxedoMan: Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4.
21:07 JoeJulian It's not just "make 2 replicas anywhere"
21:08 TuxedoMan ahh
21:08 TuxedoMan so it's in 2's, in that order
21:08 JoeJulian right
21:08 TuxedoMan Hmm
21:08 JoeJulian So to make it 3s, you have to add bricks to expand those replica sets.
21:09 TuxedoMan So in my scenario with 10 nodes in my trusted pool currently replicating 2's
21:09 TuxedoMan How would I bump it to 4?
21:09 TuxedoMan By adding 2 more bricks and changing the replica to 4?
21:09 TuxedoMan with the add-brick
21:11 TuxedoMan i might be misunderstanding
21:11 JoeJulian assuming by "node" you mean server and not just any endpoint (the actual definition of "node") and each server currently has 1 brick, that would mean you have 10 bricks or 5 replica sets of 2. To make each of those replica sets have 4 bricks, you would need to add 10 bricks.
21:12 JoeJulian You can have more than one brick per server so you could actually create a replica 4 volume across your existing hardware.
21:12 JoeJulian ... and you did read ,,(replica) and are sure you actually need that much fault tolerance or load capability?
21:13 glusterbot Please see http://goo.gl/B8xEB for replication guidelines.
21:13 TuxedoMan Yes I read it, I'm just throwing scenario out to fully understand it
21:13 TuxedoMan and I did mean servers
21:13 JoeJulian ok
21:13 TuxedoMan say 10 servers -- 10 bricks  replica of 2
21:14 JoeJulian btw... I use "node" and "smurf" interchangeably.
21:14 TuxedoMan (my understanding) 2 servers will hold the data
21:14 TuxedoMan replicate* the data
21:14 TuxedoMan right?
21:14 JoeJulian right
21:14 TuxedoMan 2 servers in the trusted pool will have full copies of the data
21:14 chirino joined #gluster
21:14 TuxedoMan say (for some reason) those servers becom flakey
21:14 TuxedoMan and I want to add a 3rd server in the trusted pool to hold the data
21:15 TuxedoMan replicate*
21:15 JoeJulian 2 servers will have full copies of /some/ of the data.
21:15 TuxedoMan Now lets up those 2 servers, to 3.
21:15 JoeJulian The files will be distributed across the 5 replica sets.
21:15 TuxedoMan Gotchya
21:15 JoeJulian All three will hold the same set of files.
21:16 JoeJulian You can lose 2 servers without impact.
21:16 awheeler joined #gluster
21:16 jesse joined #gluster
21:16 * JoeJulian does replica 3
21:18 _pol joined #gluster
21:19 TuxedoMan Right
21:19 TuxedoMan I understand that part
21:19 TuxedoMan but right now I'm currently replica2 in my lab
21:19 TuxedoMan I want to up that to replica 3
21:19 TuxedoMan Do I have to rebuild the volume?
21:19 TuxedoMan I'm not trying to add anymore bricks or servers to the mix, only up my replica 2 to replica 3
21:21 jbrooks joined #gluster
21:22 jebba joined #gluster
21:22 andreask joined #gluster
21:24 TuxedoMan this guy kind of touches on it http://www.mail-archive.com/gluste​r-users@gluster.org/msg11802.html
21:24 glusterbot <http://goo.gl/Nb6czv> (at www.mail-archive.com)
21:25 TuxedoMan gluster volume add-brick <VOLUME> replica 3 <NEW-BRICK>
21:25 TuxedoMan my question is with the NEW-BRICK
21:26 TuxedoMan I dont want to add a new brick, just want to up the replica
21:26 JoeJulian You want to increase the replica count from 2 to 3 without giving glusterfs a target to place the 3rd replica on.
21:26 TuxedoMan ok
21:27 TuxedoMan so the new-brick is my choice of which brick to replicate it to
21:27 JoeJulian That works if you have a Schrodinger's Storage System.
21:27 JoeJulian right
21:27 TuxedoMan Gotchya
21:27 TuxedoMan and it doesn't necessarily have to be the 3rd one in line at the time of creating the volume
21:28 TuxedoMan that only matters if I say replica 4 on a 4+ server trusted pool
21:28 TuxedoMan or w/e the multiple is of the replica to brick
21:28 TuxedoMan got it
21:28 TuxedoMan thank you
21:28 JoeJulian Right... when you add-brick if you change the replica count it will shoehorn them into the correct place.
21:28 TuxedoMan That's where my confusion was.
21:28 TuxedoMan Thanks a lot
21:28 JoeJulian You're welcome.
21:29 * JoeJulian gets to work on designing a quantum drive that may or may not store the data you want, but you can't tell unless you look at that data.
21:29 johnmark nixpanic: hey. you there?
21:29 JoeJulian er, not "may or may not" but "does and does not"
21:29 johnmark JoeJulian: brilliant :)
21:30 johnmark nixpanic: you interested in presenting in Stockholm on September 4?
21:43 chirino joined #gluster
21:46 awheele__ joined #gluster
21:55 chirino joined #gluster
21:57 bryan__ joined #gluster
21:58 awheeler joined #gluster
21:59 asias joined #gluster
22:04 ninkotech joined #gluster
22:06 chirino joined #gluster
22:07 fidevo joined #gluster
22:23 chirino joined #gluster
22:24 bryan__ joined #gluster
22:38 jporterfield joined #gluster
22:41 chirino joined #gluster
22:47 tg2 schrodinger FS
22:48 tg2 lol
22:48 JoeJulian :)
22:48 JoeJulian Think of it! A drive that already has all the data you're ever going to want!
22:48 JoeJulian ... the only problem is filtering out all the data you don't want...
22:48 duerF joined #gluster
22:48 tg2 but it also contains all the child pornography in the wold :(
22:48 tg2 ^
22:48 tg2 lol
22:49 tg2 [17:27] <JoeJulian> That works if you have a Schrodinger's Storage System.
22:49 a2_ your data is simultaneously existing and deleted
22:49 tg2 no jokes I didn't even read that when i said shrodingerfs
22:49 tg2 lol
22:49 JoeJulian hehe
22:50 JoeJulian even better
22:50 tg2 Joe how do you find the time to be 24/7 glusterfs irc customer support
22:50 tg2 but not to consult ;D
22:50 JoeJulian It's a hobby
22:50 tg2 that is an interesting conundrum
22:51 JoeJulian I'm working while I do this.
22:51 tg2 I am reminded of this
22:51 tg2 http://bash.org/?258908
22:51 tg2 lol
22:51 glusterbot Title: QDB: Quote #258908 (at bash.org)
22:52 JoeJulian lol
22:52 tg2 so i'm up to 500Tb now on gluster
22:53 JoeJulian My boss supports me doing this, btw...
22:53 asteriskmonkey1 joined #gluster
22:53 asteriskmonkey joined #gluster
22:54 tg2 he should mandate that your nick have the company name for free advertising
22:55 DV joined #gluster
22:55 JoeJulian I used to do that, but I changed it 'cause I was getting tired of being called Ed.
22:55 JoeJulian www.edwyse.com
22:55 asteriskmonkey left #gluster
22:56 JoeJulian a2_: If you have a minute, could you take a quick look at bug 999356
22:56 glusterbot Bug http://goo.gl/jsVaEo unspecified, unspecified, ---, amarts, NEW , client crash in el5 i386
22:58 chirino joined #gluster
23:02 tg2 I can never work for a company that supports croc
23:02 tg2 :D
23:02 JoeJulian lol
23:03 tg2 you guys are in desparate need of a responsive designer
23:03 tg2 lol
23:04 JoeJulian I'm in the process of turning the backend into an api. The architecture is complete legacy crap that the graphic artist and I just work around as best as possible.
23:04 a2_ JoeJulian, checking
23:04 JoeJulian Once we have an api, the layout won't be part of the code and it'll open up a lot of possibilities.
23:05 tg2 backend has to be api based then frontend can be separated
23:06 tg2 word from the wise, if you're doing SPA, use angularJS instead of jquery/backbone
23:07 JoeJulian I'm not a big fan of SPA for SEO reasons.
23:08 a2_ JoeJulian, do you have the core dump in the crash environment?
23:08 chirino joined #gluster
23:08 JoeJulian not right this moment. I had to reinstall 3.3.2 so I could get them up and running before morning.
23:09 JoeJulian I do still have the core dump, just not the environment.
23:09 tg2 alexa 2 million = probably not a huge fan of SEO as it is :D
23:09 JoeJulian tg2 where do you work?
23:09 tg2 if you use /#! for your spa google will render it
23:09 tg2 I own my own company
23:09 tg2 15 employees, all application development
23:10 a2_ JoeJulian, it's not clear where the crash is? was it a live process or a core dump from where you got the bt?
23:11 JoeJulian I couldn't get anything from the core dump so I ran a live process.
23:11 JoeJulian That's where it segfaulted.
23:12 a2_ oh
23:12 a2_ in #0  0x0011e25c in __glusterfs_this_location@plt () from /usr/lib/libglusterfs.so.0
23:12 a2_ had you done an ldconfig after upgrading to 3.4?
23:13 JoeJulian only if that's part of the rpm scripts
23:13 a2_ was this 3.4.0?
23:13 JoeJulian yes
23:13 a2_ let me check that code snapshot.. the line numbers have different code on master
23:15 a2_ crashing in __glusterfs_this_location() is very suspicious
23:15 awheele__ joined #gluster
23:19 JoeJulian yes, ldconfig is part of %post
23:19 chirino joined #gluster
23:20 a2_ do you know what file it is? it is a 358 byte file in the top level mount directory
23:21 JoeJulian hmm, any of 4 files or a directory...
23:21 a2_ it's a file
23:21 a2_ 358 bytes, in the / of the volume
23:22 a2_ can you look up the xattr dump of the file from all the backends?
23:22 JoeJulian Should match the ia_ino, right?
23:23 a2_ ia_ino is not the inode number from the backend
23:23 a2_ the mode is 777
23:23 a2_ size is 358 bytes
23:23 * JoeJulian grumbles that everything is 777...
23:24 a2_ find /mnt -maxdepth1 1 -size 358
23:24 a2_ shouldn't be too many files
23:24 JoeJulian right, like I said there's 4
23:25 a2_ ah
23:26 JoeJulian Ok, narrowed it down by timestamps.
23:27 a2_ cool
23:29 JoeJulian I don't think this will be helpful... http://ur1.ca/f5n43
23:29 glusterbot Title: #33913 Fedora Project Pastebin (at ur1.ca)
23:29 SteveCooling ooh
23:30 SteveCooling bug 999356 may be what I had
23:30 glusterbot Bug http://goo.gl/jsVaEo unspecified, unspecified, ---, amarts, NEW , client crash in el5 i386
23:30 a2_ JoeJulian, same on all servers?
23:30 JoeJulian yes
23:30 a2_ SteveCooling, what did you have?
23:30 SteveCooling glusterfs 3.4 clients crash on el5 i386
23:31 SteveCooling i registered a bug, too.. lemme find the number
23:31 a2_ JoeJulian, did you test it only on i386, or did you test on both 64/32 and the crash happened only on i386?
23:31 SteveCooling it is not present in x86_64
23:31 SteveCooling i tested it
23:31 JoeJulian I don't have a 64bit el5 box set up and I was sleepy. :D
23:31 SteveCooling i am right now upgrading to 3.4 on a number of el5 x86_64 :)
23:31 a2_ hmm.. maybe a 32bit issue
23:32 SteveCooling bug 997902
23:32 glusterbot Bug http://goo.gl/RTWzcG unspecified, unspecified, ---, csaba, NEW , GlusterFS mount of x86_64 served volume from i386 breaks
23:32 JoeJulian I wonder if it could be in any way related to Emmanuel's
23:34 JoeJulian yep, that's the same one
23:34 SteveCooling I could NOT reproduce on 64 bit arch
23:35 SteveCooling which led me to upgrade my production system...
23:35 JoeJulian hmm... makes me wonder if it would break on 32bit el6...
23:36 SteveCooling interesting.. i would test, but cannot until tommorrow.
23:37 JoeJulian Oh, I know it works from 32bit Fedora 19 (not sure which kernel that is0
23:38 SteveCooling probably not related, but the noarch bit of the repo is missing for EL5
23:38 SteveCooling so that has to be disabled manually
23:39 SteveCooling also, on RHEL5, the releasever is "5Server", which does noe have a directory, so that also need manual fiddling to get working.
23:41 chirino joined #gluster
23:41 a2_ 997902 looks suspiciously similar to 999356
23:41 a2_ SteveCooling, are you the reporter of 997902?
23:42 gluslog_ joined #gluster
23:42 SteveCooling yes
23:43 a2_ SteveCooling, is the environment where the crash happened still easily available for inspection?
23:44 SteveCooling yes
23:44 JoeJulian btw... works in 32 bit el6
23:45 SteveCooling i installed a new i386 centos5 machine to reproduce the crash
23:45 JoeJulian So some structure change between 2.6.18 and 2.6.32?
23:45 a2_ SteveCooling, is it possible to let me ssh to the crashing server to inspect the core dump in gdb? i can be in a screen session
23:46 SteveCooling that can be arranged, but i need to finish the updates i'm running first. than i can bring the vm online againg
23:46 SteveCooling *then
23:47 SteveCooling i'm on scheduled downtime here ;)
23:47 a2_ ok!
23:47 JoeJulian I'll install one at rackspace... it'll be a minute.
23:47 twx joined #gluster
23:48 SteveCooling but i don't think reproducing the problem is hard anywhere. the only thing i noticed is that my volume root dir worked. it had like two directories in it. inside both of them are about 800 more dirs. that's where it crashed
23:49 SteveCooling a simle ls in either will make it crash
23:49 SteveCooling *simple
23:49 JoeJulian So the first file it comes to maybe...
23:49 SteveCooling also, accessing a file inside one of them, where the path is already known
23:50 SteveCooling it could be files, but i cannot say that for sure
23:50 a2_ ...
23:50 a2_ ok
23:56 chirino joined #gluster
23:59 a2_ JoeJulian, are you installing a vm for this test?
23:59 JoeJulian yes

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary