Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-04-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:01 ninkotech joined #gluster
00:08 chirino joined #gluster
00:14 yinyin_ joined #gluster
00:19 theron joined #gluster
00:54 theron joined #gluster
01:44 harish joined #gluster
02:07 haomaiwa_ joined #gluster
02:34 vpshastry1 joined #gluster
02:40 vpshastry1 left #gluster
02:47 bharata-rao joined #gluster
03:09 theron joined #gluster
04:09 nueces joined #gluster
04:10 amukherj__ joined #gluster
04:17 kdhananjay joined #gluster
04:18 kanagaraj joined #gluster
04:32 vpshastry1 joined #gluster
04:32 ppai joined #gluster
04:33 Ark joined #gluster
04:35 aravindavk joined #gluster
04:50 Matthaeus joined #gluster
04:56 ravindran1 joined #gluster
05:01 RameshN joined #gluster
05:06 ndarshan joined #gluster
05:07 kumar joined #gluster
05:10 davinder joined #gluster
05:17 pvh_sa joined #gluster
05:19 saurabh joined #gluster
05:26 deepakcs joined #gluster
05:28 tg2 anybody around?
05:29 tg2 have an odd issue where I have files in a directory but the path to that directory doesn't exist through my gluster mount
05:29 tg2 ie: /path/to/this/file.txt exists, i can do ls and read it.
05:29 kseifried huh
05:29 tg2 but when in /path/to
05:29 tg2 ls -l
05:30 tg2 doesn't show anything
05:30 kseifried what's the mount point?
05:30 tg2 ok the actual path is /storage/
05:30 tg2 that is the gluster mount
05:31 tg2 basically i have a ghost folder... one of the clients wrote some data there, it exists if i type in cd /storage/downloads/11111/subfolder
05:31 nshaikh joined #gluster
05:31 tg2 i get the folder
05:31 tg2 ls -l lists all the files in there
05:31 tg2 but if I go to /storage/downloads
05:31 tg2 there is no 11111 folder in there
05:32 tg2 likewise if i go to /storage/downloads/11111 it exists, but I don't see the subfolder in there.
05:34 tg2 this is a bit of an issue because if client 1 adds a directory tree in there
05:34 tg2 and client 2 checks it with ls -l
05:34 tg2 client 2 sees nothing
05:34 tg2 and indexes nothing
05:34 tg2 yet the files are still on disk on one of the bricks
05:35 tg2 should I do a fix-layout rebalance?
05:38 raghu joined #gluster
05:38 prasanth_ joined #gluster
05:53 ngoswami joined #gluster
05:59 Ark joined #gluster
06:03 lalatenduM joined #gluster
06:07 rastar joined #gluster
06:16 dusmant joined #gluster
06:25 Honghui joined #gluster
06:26 Honghui hello, we are building a glusterfs cluster, should we use raid or raw disk?
06:27 Honghui For 12 disks, 3TiB each, should we group them into 2 raid6?
06:33 psharma joined #gluster
06:36 rjoseph joined #gluster
06:36 keytab joined #gluster
06:44 haomaiw__ joined #gluster
06:45 tg2 single R6 with 12 disks is decent if they are enterprise disks
06:46 tg2 http://www.smbitjournal.com/2012/11/c​hoosing-a-raid-level-by-drive-count/
06:48 wgao joined #gluster
06:48 dusmant joined #gluster
06:55 ctria joined #gluster
06:58 ekuric joined #gluster
07:02 ricky-ticky joined #gluster
07:02 itisravi joined #gluster
07:05 edward1 joined #gluster
07:05 samppah Honghui: currently red hat recommends 12 disk raid6 arrays with RHS
07:06 edong23 joined #gluster
07:21 haomaiwang joined #gluster
07:24 ravindran1 joined #gluster
07:38 aravindavk joined #gluster
08:00 dusmant joined #gluster
08:02 awktane joined #gluster
08:05 awktane Having issues with gluster 3.4 on ubuntu 14.04. It won't connect to an existing cluster of 4 bricks running the exact same version from the PPA but under ubuntu 12.04. Cleaned /var/lib/glusterd a few times and restarted. When probing from any of the 4 bricks I get a peer rejected (connected) but the new ubuntu 14 brick never sees any other peers. Just the one it was rejected by.
08:16 pvh_sa joined #gluster
08:18 kanagaraj joined #gluster
08:19 vimal joined #gluster
08:35 awktane Looks like I'm the only crazy person up at this hour. I'll try a little later! :)
08:38 doekia @awktane: did you check firewall rules and/or revision of your xlators?
08:39 awktane Firewall open. reversion of xlators?
08:39 awktane err revision
08:43 kseifried Honghui depends on what you need.
08:44 kseifried like I mostly use raid 1+ (e.g. replicas), as I car about reliability mostly
08:44 doekia I mean did you check that basic connectivity succeed in between your 14.04 vs 12.04
08:44 doekia and that the xlators are the exact same version?
08:44 kseifried what version of gluster is on 14 and 12? I'm guessing way out of synch
08:45 doekia @kseifried +1, same guess here
08:46 awktane I'm in the process of upgrading our cluster to ubuntu 14 across the board because there are a bunch of packages that would make our lives a lot easier. I was hoping to migrate the bricks over to the new servers. Not sure how to check translator versions. I watched a tcpdump conversation between the two so confirmed connectivity.
08:46 kseifried glusterd -v? --version?
08:47 kseifried and for gluster updates I create new bricks and rsync
08:47 awktane Identical.
08:48 doekia I have faced some issue during upgrade. I usually run in degrated mode during such... force sync all nodes, migrate one valid node, shutdown the other, migrate a second, etc...
08:48 awktane Yeah that's what I was going to do next. Was hoping to avoid it. hehe
08:48 kseifried I like to do upgrades with full system to fall back on cause then things don't go all kaboom =)
08:49 kseifried well you could add a machine, replicate data, then split it off and base new servers off of it
08:49 doekia Backup is your fallback during that phase
08:50 doekia At least that is the way I manage upgrade
08:51 lalatenduM awktane, when you say "It won't connect to an existing cluster of 4 bricks running the exact same version from the PPA but under ubuntu 12.04". Are your trying to peer probe from the new host to a host from the trusted pool
08:51 saravanakumar joined #gluster
08:52 awktane Trusted pool member is probing new brick.
08:52 lalatenduM awktane, or you are doing a peer from one of the nodes of trusted cluster to the new server
08:52 lalatenduM awktane, ok, this should work
08:52 lalatenduM awktane, what is the error peer probe is returning
08:53 awktane Probe shows Rejected (Connected) on trusted member. Does not migrate the peer to the other trusted though.
08:54 awktane Didn't see anything in the logs on either end to give me any direction.
08:54 lalatenduM awktane, seems like a firewall issue, We need to make sure iptables does not block any requests
08:54 ravindran2 joined #gluster
08:56 awktane Opened firewall, then dropped firewall with no success. Can see the convo on tcpdump going back and forth. I'll double check.
09:01 awktane @lalatenduM Universe for 14 version is 3.4.2. PPA is 3.4.3. Ok just to use universe rather than PPA?
09:02 lalatenduM awktane, yeah should be fine
09:02 lalatenduM 3.4.3 has some extra bug fixes than 3.4.2
09:06 awktane Just doing a clean install of 14 to remove all of the other stuff I was doing to the new servers. That way clean slate.
09:06 vpshastry1 left #gluster
09:15 awktane Yeop no go. Same result. Firewall checked and double checked. Cleared iptables and -P ACCEPT for all. No difference. Nothing in logs.
09:16 awktane lalatenduM Hmm except for syslog - [rdma.c:4102:gf_rdma_init] 0-rpc-transport/rdma: Failed to get IB devices
09:18 awktane Unrelated
09:20 vpshastry joined #gluster
09:25 lalatenduM awktane, do you have selinux in the new server ( I think Ubuntu does not ship selinux by-default)
09:25 awktane Doesn't ship.
09:31 harish joined #gluster
09:31 awktane Tried to remove everything in /var/lib/glusterd except uuid file. Restarted, probed. All existing bricks show but all show peer rejected. There's something it doesn't like.
09:33 aravindavk joined #gluster
09:52 bharata-rao joined #gluster
09:56 haomaiwa_ joined #gluster
09:57 baojg_ joined #gluster
09:59 kanagaraj joined #gluster
10:06 vpshastry1 joined #gluster
10:08 vpshastry3 joined #gluster
10:14 snehal joined #gluster
10:18 baojg joined #gluster
10:22 harish joined #gluster
10:42 mjrosenb joined #gluster
10:44 glusterbot New news from newglusterbugs: [Bug 1089642] Quotad doesn't load io-stats xlator, which implies none of the logging options have any effect on it. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089642>
10:45 nshaikh joined #gluster
10:46 baojg_ joined #gluster
10:53 DV joined #gluster
10:55 Andy5 joined #gluster
10:55 rastar joined #gluster
11:02 awktane To reproduce and confirm, created another machine under Ubuntu 12.04 like the others with glusterfs-server 3.4.3 from PPA like the others. Same result. Peer Rejected from all bricks.
11:03 baojg joined #gluster
11:09 rastar joined #gluster
11:12 diegows joined #gluster
11:16 ppai joined #gluster
11:16 ira joined #gluster
11:17 ravindran1 joined #gluster
11:19 Andyy2 joined #gluster
11:22 lalatenduM awktane, seems like a bug, I would suggets you file a bug and send it to gluster-dev mailing list
11:22 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
11:22 lalatenduM @mailinglists
11:22 glusterbot lalatenduM: http://www.gluster.org/interact/mailinglists
11:24 jan joined #gluster
11:25 gothos_ How, annoying.
11:26 awktane @lalatenduM Found a bug already. Playing to see if I can get around it. https://bugzilla.redhat.co​m/show_bug.cgi?id=1072720
11:26 glusterbot Bug 1072720: medium, unspecified, ---, ravishankar, MODIFIED , gluster peer probe results in peer rejected state
11:27 gothos_ Hello :) I'm currently benchmarking a glusterfs two machine setup with 2x RAID60 with around 54TB of usable space and 7k2 3TB drives in the background.
11:27 gothos_ And I'm wondering if there is some way to speed up random access (read and write) and mixed workloads
11:27 Andy5_ joined #gluster
11:28 Andyy3 joined #gluster
11:28 gothos_ My iozone tests with a 8KB block size give be partially only about 7-15MB/s
11:28 gothos_ *me
11:28 gothos_ sequentiell access is fine with around 750MB/s
11:29 gothos_ oh, it's replicating the data between the two machines
11:29 lalatenduM awktane, thats good new :),
11:30 awktane I wonder how long it'll take before that change hits the release version...
11:30 lalatenduM awktane, you should put an comment in the bug, can request for a back port to 3.4 branch . so that fix will be in 3.4.4
11:30 awktane @gothos What kind of files are you working with? Small? Large?
11:30 lalatenduM I mean you should request for a back port of this fix to 3.4.4 in gluster-devel mailing list
11:31 lalatenduM or clone the bug, with version 3.4.3
11:31 gothos_ awktane: I'm currently testing with huge files, around 45GB to circumvent main memory caching, in part anyway. But data is written/read in 8KB blocks. but after testing I'll have all sorts of data on the machine
11:32 gothos_ since it is supposed to be a data backend for researchers and they are kinda uncontrollable
11:33 lalatenduM getting the fix in 3.4 or 3.5 will be quick as bug fiz releases are faster than a new release
11:33 lalatenduM s/fiz/fix/
11:33 glusterbot What lalatenduM meant to say was: getting the fix in 3.4 or 3.5 will be quick as bug fix releases are faster than a new release
11:33 awktane @gothos_ Striped volumes are more aimed at that I believe but I don't have any experience with them myself.
11:34 gothos_ awktane: sure, I guess that would be faster, but we also need the redundancy since we can't properly backup 54TB of data
11:35 Andyy3 sorry I'm late. are you guys saying that there's a bug in the striped translator of gluster?
11:35 gothos_ I believe it's possible to do both, not sure tho? but we don't have the hardware at the moment
11:35 awktane @gothos_ Yeah you can do both at the same time but need additional hardware.
11:35 lalatenduM awktane, check backport wish list, http://www.gluster.org/community/documentation/ind​ex.php/Backport_Wishlist#Requested_Backports_for_3.4.4, you can add your request too
11:35 glusterbot Title: Backport Wishlist - GlusterDocumentation (at www.gluster.org)
11:36 aravindavk joined #gluster
11:39 kkeithley joined #gluster
11:43 Ark joined #gluster
11:54 vpshastry1 joined #gluster
11:56 tdasilva joined #gluster
12:09 aravindavk joined #gluster
12:12 ppai joined #gluster
12:14 edward1 joined #gluster
12:29 latha joined #gluster
12:40 B21956 joined #gluster
12:47 chirino joined #gluster
12:55 plarsen joined #gluster
13:14 glusterbot New news from newglusterbugs: [Bug 1089668] DHT - rebalance - gluster volume rebalance status shows output even though User hasn't run rebalance on that volume (it shows remove-brick status) <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089668>
13:28 vpshastry joined #gluster
13:30 dbruhn joined #gluster
13:35 kkeithley portante: ping, python setup.py install question
13:51 Psi-Jack_ joined #gluster
13:52 kkeithley portante: unping
13:52 glusterbot kkeithley: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
13:52 jmarley joined #gluster
13:52 jmarley joined #gluster
13:53 kkeithley oh glusterbot, I thought you were smarter than that
13:54 lmickh joined #gluster
13:57 ndk joined #gluster
14:00 jobewan joined #gluster
14:01 ctria joined #gluster
14:04 ira joined #gluster
14:11 wushudoin joined #gluster
14:14 gmcwhistler joined #gluster
14:15 glusterbot New news from newglusterbugs: [Bug 1089676] drc cache failed to detecte duplicates <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089676>
14:16 harish joined #gluster
14:17 Jakey joined #gluster
14:19 kkeithley portante: ping, python setup.py install question (again, after all)
14:22 snehal joined #gluster
14:32 uebera|| joined #gluster
14:32 uebera|| joined #gluster
14:45 jbrooks joined #gluster
14:49 MeatMuppet joined #gluster
14:57 lpabon joined #gluster
14:57 sauce joined #gluster
15:00 Andy5 joined #gluster
15:01 Andy5_ joined #gluster
15:01 sauce i just did my first glusterFS, pretty cool. i always wanted something like this
15:01 dbruhn congrats!
15:01 sauce i had a need to do a multi-AZ filesystem in AWS
15:01 * sauce bows
15:02 gmcwhist_ joined #gluster
15:02 sauce i'm thinking of having 2+ instances dedicated to being "gluster servers" and then connecting clients to them
15:02 sauce many of the examples i find online have the client/server on the same box
15:03 dbruhn There are a lot of people who do it both ways
15:03 Psi-Jack_ joined #gluster
15:03 aravindavk joined #gluster
15:07 gmcwhis__ joined #gluster
15:13 ctria joined #gluster
15:18 jag3773 joined #gluster
15:18 Andy5 joined #gluster
15:28 Andy5_ joined #gluster
15:28 daMaestro joined #gluster
15:29 Andy5__ joined #gluster
15:29 Honghui joined #gluster
15:34 Andy5 joined #gluster
15:44 Andy5 @pingtimeout
15:44 glusterbot Andy5: I do not know about 'pingtimeout', but I do know about these similar topics: 'ping-timeout'
15:46 sauce i stuffed my first gluster setup
15:46 sauce dd if=/dev/zero of=test99 bs=8k count=99999
15:48 sauce the bricks are single-EBS volumes, they can't handle the I/O
15:49 japuzzo joined #gluster
15:51 uebera|| joined #gluster
15:58 ctria joined #gluster
16:13 systemonkey joined #gluster
16:13 Mo___ joined #gluster
16:19 kmai007 joined #gluster
16:20 Matthaeus joined #gluster
16:22 kmai007 has anybody found any documentation on glusterfs 3.5 ?
16:22 kmai007 please share it with me
16:24 MeatMuppet joined #gluster
16:24 scuttle_ joined #gluster
16:25 vpshastry1 joined #gluster
16:28 JonnyNomad joined #gluster
16:29 jbd1 joined #gluster
16:40 [o__o] joined #gluster
16:42 failshell joined #gluster
16:44 [o__o] left #gluster
16:45 [o__o] joined #gluster
16:47 jbd1 has anyone here done a "live" upgrade of 3.3 to 3.4?  I'm preparing to do so and would prefer not to have to shut down my whole site during the upgrade
16:50 vpshastry1 joined #gluster
16:51 hagarth joined #gluster
16:57 SFLimey joined #gluster
16:58 kanagaraj joined #gluster
16:59 uebera|| joined #gluster
17:00 Andy5_ joined #gluster
17:11 mjsmith2 joined #gluster
17:26 asku joined #gluster
17:28 semiosis :O
17:28 semiosis d3vz3r0: just saw your pm, sorry i missed it earlier, i'm not in SF anymore :/
17:29 awktane @jbd1 Sorry for the delay. I did not too long ago on Ubuntu. The existing bricks work but I can't porobe any new ones nor can I set any options.
17:31 jbd1 awktane: that sounds ominous.  Did you reboot all your bricks after upgrading to 3.4.x?
17:32 awktane Yeop. One at a time. Rolling restart.
17:32 awktane I get https://bugzilla.redhat.co​m/show_bug.cgi?id=1072720
17:32 glusterbot Bug 1072720: medium, unspecified, ---, ravishankar, MODIFIED , gluster peer probe results in peer rejected state
17:33 semiosis awktane: see ,,(peer rejected)
17:33 glusterbot awktane: I do not know about 'peer rejected', but I do know about these similar topics: 'peer-rejected'
17:33 semiosis awktane: see ,,(peer-rejected)
17:33 glusterbot awktane: http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected
17:34 awktane Yeop. When I peer probe at first it only says rejected on one server. When I go through those steps then it does migrate to all bricks but then all bricks say peer rejected.
17:35 awktane Volume files are identical. Tried under Ubuntu 12 and 14. No luck. I'll be looking into it more later then reporting as necessary.
17:35 awktane Also experiencing https://bugzilla.redhat.com/show_bug.cgi?id=977497 on those servers which says fixed.
17:35 glusterbot Bug 977497: unspecified, high, 3.4.0, kparthas, NEW , gluster spamming with E [socket.c:2788:socket_connect] 0-management: connection attempt failed (Connection refused) when nfs daemon is off
17:36 jbd1 1072720 looks like a showstopper.  I won't upgrade if I can't add a peer afterward.
17:37 awktane Worst case I bring the files onto one distributed and then re-replicate them with a new vol file.
17:37 * jbd1 goes to reproduce this in vms
17:37 semiosis jbd1: you should try upgrading a test cluster and see if it works for you.  plenty of people have upgraded without any problem
17:38 jbd1 semiosis: I did that, but I didn't try probing a new peer post-upgrade.  So now I get to do it all over
17:38 awktane Yeop which makes me think it's one of my options or something but umm... can't remove options either so no way to easily find out without bringing everything down.
17:38 semiosis awktane: worst case is you take down the cluster, delete everything from /var/lib/glusterd on all servers, then re-probe & recreate the volumes... just leave the data in place on the bricks.  as long as you recreate the volumes exactly as they are now you'll be OK
17:39 awktane @semiosis Gluster will sort out that half files are on one and half on the other?
17:39 badone joined #gluster
17:40 awktane I suppose that's its job...
17:40 semiosis awktane: peer rejected means that the servers do not agree on some volume config.  the solution is to wipe out the volume config on the rejected peers (or all peers except one) then re-sync the volume configs as per ,,(peer-rejected)
17:40 glusterbot awktane: http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected
17:40 awktane No meta data.
17:40 semiosis awktane: as long as the bricks are added in the same order you'll be OK
17:40 awktane Rejected peers are brand new. No volume file info.
17:41 awktane Clean /var/lib/glusterd
17:41 semiosis that doesnt make sense
17:41 awktane Nope which is why I thought it could be related to that bug.
17:41 semiosis on one of those new servers, stop glusterd, truncate the /var/log/glusterfs/etc-glusterfs-glusterd.log file, start glusterd, and pastie that log file please
17:43 awktane @smiosis Roger Roger
17:43 awktane I'll do fresh install/log so I can just throw the whole log up.
17:47 awktane @semiosis http://pastie.org/9098095
17:48 awktane @semiosis Now it shows State: Peer Rejected (Connected) on new server and old. Did not migrate the peer to all existing servers. If I wipe out /var/lib/glusterd then it does migrate to all but then all say rejected.
17:53 uebera|| joined #gluster
17:59 vpshastry joined #gluster
17:59 neofob joined #gluster
18:01 mjsmith2 Anyone have suggestions for debugging why volume start would fail? Unfortunately the hosts are no longer running so all I have are the log files.
18:02 awktane Anything fun in the log files?
18:04 B219561 joined #gluster
18:06 awktane @semiosis By adding the bricks again in the same order by the way you mean same order they're listed in gluster volume status, correct?
18:07 mjsmith2 awktane: nothing prior to the volume start failing
18:08 awktane @mjsmith2 Which version of gluster?
18:08 mjsmith2 3.4
18:09 awktane @mjsmith2 And you've tried to restart glusterd?
18:10 mjsmith2 No, this was a nightly test that failed. 90% of the time its fine
18:11 awktane @mjsmith2 The only time I've seen that happen a restart of glusterd woke it up. No idea what was wrong at the time though.
18:11 zaitcev joined #gluster
18:13 mjsmith2 Yeah, I'm sure a restart would have fixed it but it's nice to know why it failed in the first place
18:14 rotbeard joined #gluster
18:20 awktane @mjsmith2 Gotta love it when the log files don't log details of a failure.
18:25 JoeJulian Looks like you didn't follow directions so now you have a new uuid.
18:26 chirino joined #gluster
18:28 sputnik1_ joined #gluster
18:28 sputnik1_ is anyone using nfs to mount gluster on windows?
18:29 [o__o] left #gluster
18:33 dbruhn sputnik13net, I've tested it and it works well in that, haven't used it in production
18:34 dbruhn sorry misread that
18:34 dbruhn I used samba
18:34 sputnik13net oh, ok
18:34 sputnik13net :(
18:34 awktane @JoeJulian You talking to me? If so, then trusted members do not have it as a peer therefore uuid doesn't matter. gluster.info is kept and therefore uuid is the same during the restart that migrates the peer to all as per peer-rejected. gluster version identical on both ends.
18:34 sputnik13net I wonder whether gluster is using nfsv3 or v4
18:35 JoeJulian [2014-04-21 17:44:50.260619] E [store.c:394:gf_store_handle_retrieve] 0-: Unable to retrieve store handle /var/lib/glusterd/glusterd.info, error: No such file or directory
18:35 dbruhn sputnik13net, in testing I was using CTDB in front of it and it made it resilient to single node failures, that coupled with rrdns worked well.
18:35 sputnik13net what's ctdb
18:35 awktane @JoeJulian No files exist at all - it's a fresh brick.
18:35 kkeithley sputnik13net: no need to wonder. Gluster's built-in NFS server is NFSv3.
18:35 dbruhn someone can correct me, but I believe it is still using v3
18:35 kkeithley If you want NFSv4, get nfs-ganesha and use the Gluster gfapi FSAL.
18:36 sputnik13net I don't need nfsv4, just wanted to know which version gluster is using
18:36 dbruhn ctdb, creates shared virtual IP addresses and creates a locking mechanism so if a connection drops after a timeout the samba client will start working off of one of the other machines
18:36 JoeJulian awktane: Are you trying to probe from a new server ,,(glossary) to an existing server in the trusted pool?
18:36 glusterbot awktane: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
18:37 sputnik13net dbruhn: ic, thx for the pointer, but I'm using primarily linux boxes, just have a windows jumpbox that I want to have access to our shared storage :)
18:38 sputnik13net so I'd prefer to nfs mount the gluster volume on the windows box
18:38 dbruhn For sure
18:38 awktane @JoeJulian When adding a brick you have to probe from a trusted to untrusted to add it
18:38 awktane @JoeJulian or server apparently
18:39 awktane Glossary smoffery I like the sound of bricks! You build with bricks! ;)
18:39 JoeJulian awktane: I'm trying to figure out what you're doing wrong which is why I'm asking.
18:40 dbruhn sputnik13net, you can use different front ends and still remain accessible through the other means. Using samba in front of gluster is just resharing the volume through samba
18:40 JoeJulian maybe I'll scroll all the way back and stop being lazy... ;)
18:40 awktane @JoeJulian I don't mind answering anything. Maybe I will find the screw that fell out.
18:44 zerick joined #gluster
18:47 JoeJulian Mmm, I see the bug you were referencing. I think the best workaround would be to rsync /var/lib/glusterd/vols from the old server to the new rejected server then restart glusterd on the new server (I think just that one should be sufficient, but if not try restarting all glusterd). Don't reinstall or wipe anything else.
18:53 JoeJulian awktane: ^
18:55 foster joined #gluster
18:57 chirino joined #gluster
18:57 jbd1 awktane: I just reproduced your issue with peer rejected
18:57 awktane @jbd1 Wonderful.
18:57 JoeJulian semiosis: does libgfapi produce logs in /var/log/glusterfs?
18:58 semiosis only when you tell it to
18:58 awktane @jbd1 Do you have any volume options set, out of curiosity?
18:58 semiosis JoeJulian: https://github.com/gluster/gluster​fs/blob/master/api/src/glfs.h#L196 -- you tell it where to write the log
18:58 glusterbot Title: glusterfs/api/src/glfs.h at master · gluster/glusterfs · GitHub (at github.com)
18:59 [o__o] joined #gluster
18:59 JoeJulian hmm, I wonder where qemu puts them then...
18:59 jbd1 awktane: no, I did the "cleanest" test I could manage.  Created 3 new VMs, installed GlusterFS 3.3 on 1 and 2, created a distributed volume between 1 and 2, mounted from another VM and created some files, upgraded 1 to 3.4, upgraded 2 to 3.4, booted 3, installed GlusterFS 3.4 on 3, peer probed 3 from 1, peer rejected
19:00 Andy5_ JoeJulian: https://github.com/qemu/qemu/blob​/stable-1.7/block/gluster.c#L211
19:00 glusterbot Title: qemu/block/gluster.c at stable-1.7 · qemu/qemu · GitHub (at github.com)
19:01 awktane Can you try the steps in http://www.gluster.org/community/documen​tation/index.php/Resolving_Peer_Rejected to confirm that the same thing happens, all peers then see it as rejected?
19:01 glusterbot Title: Resolving Peer Rejected - GlusterDocumentation (at www.gluster.org)
19:01 awktane @jbd1 ^ oops
19:03 jbd1 awktane: that corrected my issue.
19:04 jbd1 awktane: after step 5, I just rebooted the VM (3).
19:05 awktane @jbd1 Bah so keep on trucking then. I have a 6 server setup. When I do that I can get at most 1-2 of them to not say peer rejected. Usually they all just say peer rejected.
19:06 jbd1 awktane: there is definitely something funky going on here. I'm not exactly filled with confidence about this procedure.
19:06 awktane @JoeJulian Volume file is identical in every way across both servers. Even at a binary level no differences.
19:07 awktane @jbd1 Indeed.
19:08 awktane @jbd1 I threw an e-mail out to gluster-users, next gluster-dev if I don't see anything. After that I'll probably end up attempting to re-create all the things.
19:11 jbd1 awktane: I ran into weird issues when I went to add-brick of the volume on the third vm, so I'm detaching and re-probing
19:12 jbd1 awktane: and I'm back to peer rejected
19:12 awktane @jbd1 Well in a way, I'm glad it's not just me.
19:12 JoeJulian awktane: It's not the vol file, according to that bug report, but rather the info file.
19:13 jbd1 awktane: It's frustrating, getting "peer probe: success" and then seeing State: Peer Rejected (Connected)
19:13 awktane @jbd1 Yeah, watching tcpdump there is a conversation going on but apparently not a very good one.
19:15 awktane @jbd1 JoeJulian may have just poked the brains. /var/lib/glusterd/vols/{brick}/info on old has a "op-version=2" and a "client-op-version=2"
19:15 awktane @jbd1 Whereas the new does not.
19:16 awktane Mismatch and therefore unhappy.
19:16 JoeJulian That's the bit I was expecting that bug meant, but I'm spread 6 ways from sunday right now and still hadn't had a chance to read the patch.
19:17 awktane @JoeJulian Hey now... no bugging my peeps on gluster-users! haha
19:18 JoeJulian It's always fun to return from a week of conferences.
19:18 awktane @jbd1 Oops sorry had my windows crossed. Those lines exist on new but not old.
19:18 dbruhn JoeJulian, was summit cool? I wanted to go
19:18 jbd1 awktane: after going through the rejected stuff again, my node is at "Sent and Received peer request (Connected)" but nothing I do seems to improve from there
19:18 awktane @jbd1 Which means that upgrade is not adding those lines but new 3.4 clients add them automatically?
19:19 JoeJulian It's always fun chatting with people about Gluster.
19:19 JoeJulian One of these times I'll have to go to sessions.
19:19 awktane @jbd1 Yeah, I ran into that a few times too. Cleared out /var/lib/glusterd again and it went away.
19:20 awktane Removed those two new pesky lines and it's happy.
19:21 jbd1 awktane: I see that-- /var/lib/glusterd/vols/gv0/info on the NEW host has op-version=1, client-op-version=1 while same file on the upgraded host does not
19:22 JoeJulian The op-version/client-op-version are supposed to choose the greatest available standard rpc for the versions installed. This allows backward compatibility.
19:22 edong23 is rdma with gluster functional?
19:23 JoeJulian edong23: Some have had great success, some none. I think it's due to driver differences, but that's just a guess.
19:23 edong23 JoeJulian: in 3.4? or 3.5beta?
19:23 edong23 or, do yuou know?
19:24 awktane @jbd1 If you force a match it appears to work. I can't get it to fail anymore. No oddness.
19:24 edong23 i mean, i know the drivers for my cards are aweome
19:24 JoeJulian Should be the same. I don't think there's been any significant changes between 3.4 and 3.5 in that regard.
19:24 John_HPC joined #gluster
19:26 jbd1 awktane: I tried adding the lines manually on server 1 (doing all the appropriate restarts, etc) and doing so allowed me to (finally) probe server 3, but caused server 2 to move into peer rejected
19:26 jbd1 awktane: now I'm working to get server 2 back into the cluster
19:26 jbd1 awktane: success
19:26 awktane @jbd1 I took all down and made the change on all, then brought them all back up. I'm fine with downtime as long as it's quick.
19:27 awktane Well and for production... in the middle of the night.
19:27 John_HPC What was the command, to display all the variables, like "Option: storage.owner-uid"
19:28 JoeJulian Are you referring to "gluster volume info" or "gluster volume set help" to see the defaults?
19:28 John_HPC Think set help ;)
19:29 jbd1 awktane: too bad we can't just gluster volume set gv0 op-version 1
19:29 JoeJulian awktane: You /should/ be able to make that change live. Restarting glusterd doesn't restart the bricks so that should allow them to join the pool.
19:29 awktane @JoeJulian Yeah I thought that too but I prefer to dance on the safe side when it comes to production.
19:30 jbd1 The 3.3->3.4 upgrade => peer rejected thing is definitely a bug though
19:30 awktane I added some verbage to the bug report translating what the problem is into english. Hopefully it helps future folk.
19:31 jbd1 awktane: thanks
19:32 awktane @JoeJulian That simple brain poke ended a problem I've been teasing with since 1am so a beer goes out to you.
19:32 JoeJulian Excellent, glad I could help.
19:34 jbd1 awktane: you see op-version=2 ? I see op-version=1
19:38 JoeJulian op-version=1 was 3.3. op-version=2 is 3.4. I'm not sure if the rpc changed for 3.5, but if it did, that should be op-version 3.
19:39 kmai007 is glusterfs 3.5 out? where might I find documentation for it?
19:41 * JoeJulian grumbles about documentation...
19:41 kmai007 JoeJulian: LOL
19:42 JoeJulian https://forge.gluster.org/glusterf​s-core/glusterfs/trees/master/doc
19:42 glusterbot Title: Tree for glusterfs in GlusterFS Core - Gluster Community Forge (at forge.gluster.org)
19:43 nueces joined #gluster
19:43 JoeJulian There was a push-back to require all new features to be documented before the release of 3.5.0 but that push-back failed.
19:43 kmai007 so all glusterfs stuff is on forge? not www.gluster.org ?
19:44 kmai007 i have a push-back widget we can use
19:44 hagarth JoeJulian: the push back was uncalled for ;) .. there's reasonable amount of documentation for features in 3.5
19:44 JoeJulian :D
19:45 JoeJulian Aren't there open bugs for missing docs?
19:45 hagarth JoeJulian: yes, let me pull that out
19:46 JoeJulian To be honest, though, I haven't had the time to look at all the details.
19:46 kmai007 where can i read more about readdirp ?
19:46 hagarth JoeJulian: https://bugzilla.redhat.co​m/show_bug.cgi?id=1071800 is the 3.5.1 tracker
19:46 glusterbot Bug 1071800: unspecified, unspecified, ---, vbellur, NEW , 3.5.1 Tracker
19:46 JoeJulian @lucky readdirplus
19:46 glusterbot JoeJulian: http://publib.boulder.ibm.com/infocent​er/aix/v7r1/topic/com.ibm.aix.prftungd​/doc/prftungd/use_readdirplus_ops.htm
19:47 JoeJulian meh, maybe not that easy.
19:47 hagarth kmai007: commit log in http://review.gluster.org/4519 might help
19:47 glusterbot Title: Gerrit Code Review (at review.gluster.org)
19:47 jbd1 so when I upgrade 3.3->3.4, the process will have to be 1. upgrade each server, rebooting in-between, 2. upgrade all clients and re-mount, 3. update the info file for all volumes to say op-version=2 and client-op-version=2 prior to adding any new bricks
19:47 kmai007 hagarth:thanks
19:48 JoeJulian hagarth: Sorry we didn't get a chance to talk face-to-face about my emails. I just wanted to make sure you knew I wasn't being hostile.
19:49 hagarth JoeJulian: I have no doubt about your intentions :) .. I hope I was not seen as hostile too!
19:50 JoeJulian never.
19:50 hagarth I will bbiab
19:51 awktane @jbd1 Mine too.
19:56 sputnik13net is anyone using gluster as nfs storage for vmware esxi?
19:56 John_HPC Anyone have experiance with setting "network.tcp-window-size"?
19:58 Andy5_ semiosis: question regarding libgfapi: shall it handle reconnection automagically or is there anything required from the site of the caller ? (in case of brick failures).
19:59 JoeJulian Andy5_: It uses the standard translators, including the client translator. The client translator handles reconnection. No action should be required.
20:00 Andy5_ thanks. I'm compiling qemu with logging to file. hope to have more info soon.
20:09 _Bryan_ joined #gluster
20:14 MeatMuppet Guys, we're having an issue on our prod openstack environment, which is backed by gluster using two replicas (I know.  I wasn't given a choice.)  We lost storage on one of the replica servers and so had to replace failed bricks.  The heal operation on Cinder and Nova volumes is coming up on the two-week mark and it seems as if it will never catch up and finish.  Nova heal info shows a constantly fluctuating list with multiple heals on many of the files
20:17 MeatMuppet Running 3.4.1
20:21 mjsmith2 joined #gluster
20:26 awktane @MeatMuppet Two weeks? How much data?
20:27 MeatMuppet nova is at 1.1T, cinder is about 30T of sparse files, so about 6T utilized.
20:28 awktane @MeatMuppet It took a day or two for our prod server to recover from about 60G.
20:28 awktane (under load of course)
20:29 MeatMuppet awktane: we're pretty heavily loaded as well.
20:30 awktane @MeatMuppet We're actually in the process of adding another distributed replication pair as well to help level the load a bit. Hopefully someone is able to help you out with speeding it up.
20:30 ricky-ticky joined #gluster
20:32 MeatMuppet Thx.  I've tried bumping up cluster.background-self-heal-count but I'm not sure how high that can go without bad things happening.
20:32 brutuz joined #gluster
20:34 awktane @MeatMuppet Out of curiosity how many servers are you running?
20:35 dbruhn 2
20:35 MeatMuppet just two right now.  we have more waiting in the wings
20:38 social joined #gluster
20:38 MeatMuppet They are OCP Winterfells and Knox trays.  There doesn't appear to be any resource starvation.  Disks are active but nowhere near saturation.  CPU and memory are likewise good.
20:46 semiosis Andy5_: libgfapi is essentially the same as a native FUSE client but without the FUSE xlator at the top (or bottom, if you're JoeJulian :) of the graph.
20:47 semiosis Andy5_: so network/brick connectivity issues would be handled the same way, by the same code in fact
20:50 Andy5_ semiosis: got it. I'm trying to understand why kvm+qemu has troubles accessis the volume after a brick failover. seems not to recover access to the volume and dies.
20:50 Andy5_ s/accessis/accessing/
20:50 glusterbot What Andy5_ meant to say was: semiosis: got it. I'm trying to understand why kvm+qemu has troubles accessing the volume after a brick failover. seems not to recover access to the volume and dies.
20:52 Andy5_ Is the caller to libgfapi supposed to do anything on "notifying CHILD-UP" event ?
20:59 semiosis tbh i haven't tried any failure scenarios with libgfapi clients yet, but i would expect it to be transparent to the caller
21:00 semiosis at least until a fatal error, when i expect the caller to get some E_ thing
21:01 Andy5_ semiosis: what happens is that I have 2 replica. I fail the first brick and KVM works from the second. Then I restart the first and heal. After heal finished, I stop brick 2 and KVM throws disk errors.
21:02 semiosis Andy5_: are you sure the client was ever really connected to brick 1?
21:02 Andy5_ so say the logs. it reconnects after the failure.
21:03 semiosis hmm odd
21:03 Andy5_ here's the logs: http://pastie.org/9098524
21:03 glusterbot Title: #9098524 - Pastie (at pastie.org)
21:05 Andy5_ these are logs from patches qemu which logs to disk.
21:05 Andy5_ s/patches/patched/
21:05 glusterbot What Andy5_ meant to say was: these are logs from patched qemu which logs to disk.
21:05 semiosis right
21:07 Andy5_ looks like a bug.
21:07 Andy5_ I'm not sure why it tries to heal replica-0 which was already healed beforehand.
21:16 B21956 joined #gluster
21:16 failshel_ joined #gluster
21:22 ricky-ticky joined #gluster
21:30 JoeJulian Andy5_: After the heal is finished and before killing the second replica, let's see "getfattr -m . -d -e hex $brick/$imagefile" on each server.
21:31 Andy5_ ok. trying now.
21:35 Andy5_ JowJulian: they're the same. http://pastie.org/9098571
21:35 glusterbot Title: #9098571 - Pastie (at pastie.org)
21:35 JoeJulian Interesting
21:35 JoeJulian There's nothing there that indicates a heal being required.
21:35 Andy5_ JoeJulian: I confirm. killing the second brick, kills kvm.
21:36 Andy5_ yep. they're in sync.
21:36 JoeJulian You have a bug filed, right?
21:36 Andy5_ not yet.
21:37 JoeJulian Go ahead and file it. Include those logs and the getfattr data. File it against replicate.
21:37 Andy5_ I'm new to gluster dev. where's the gluster bugtracker?
21:37 JoeJulian file a bug
21:37 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
21:38 Andy5_ ouch. I need to register. Ok. Doing it now.
21:43 marcoceppi joined #gluster
21:43 marcoceppi joined #gluster
21:47 cleverfoo joined #gluster
21:47 cleverfoo hey folks
21:50 cleverfoo hey folks anyone know a good gluster consulting engineer/company?
21:50 ricky-ticky joined #gluster
21:51 JoeJulian @commercial
21:51 glusterbot JoeJulian: Commercial support of GlusterFS is done as Red Hat Storage, part of Red Hat Enterprise Linux Server: see https://www.redhat.com/wapps/store/catalog.html for pricing also see http://www.redhat.com/products/storage/ .
21:52 JoeJulian Other than that, I know of nobody that makes a living at consulting.
21:52 cleverfoo joined #gluster
21:53 cleverfoo thanks, any independent consultants?
21:53 cleverfoo worry about the lead time on dealing with RH
21:53 cleverfoo legal etc
21:54 cleverfoo @JoeJulian thx tho
21:54 JoeJulian Odd. Most people that use RH do so to meet legal/regulatory requirements from what I've seen.
22:01 marcoceppi joined #gluster
22:01 marcoceppi joined #gluster
22:02 Andy5_ that was hard. bug filed here: https://bugzilla.redhat.co​m/show_bug.cgi?id=1089758
22:02 glusterbot Bug 1089758: unspecified, unspecified, ---, pkarampu, NEW , KVM+Qemu + libgfapi: problem dealing with failover of replica bricks causing disk corruption and vm failure.
22:06 cleverfoo joined #gluster
22:07 cleverfoo @JoeJulian sorry that’s not what I meant, I’m just interested in something faster than talking to a RH sales guy and having to wait for a consulting contract to push trhough legal
22:07 JoeJulian Ah
22:07 cleverfoo not averse to it tho
22:08 JoeJulian For more money than it's worth, I would take the time off to do it. ;)
22:08 cleverfoo understood
22:10 cleverfoo @Andy5_  that is one scary bug
22:11 Andy5_ cleverfoo: yes. quite expensive too.
22:12 cleverfoo I bet dude, hope you didn’t lose any important data
22:12 * JoeJulian needs to spearhead an educational program dedicated to teaching the difference between lose and loose. ;)
22:13 cleverfoo ah
22:13 cleverfoo oops my bad
22:13 Andy5_ did not loose anything (have backups), but my uptime went south on a couple of key servers.
22:13 JoeJulian You're not the only one.
22:13 Andy5_ thanks for asking.
22:13 JoeJulian Many many times people have loosed their data... I'm not sure how it gets loose but there you have it...
22:14 cleverfoo we haven’t switched to libgfapi but I’ll prob wait until that bug is closed
22:14 cleverfoo @JoeJulian http://instantrimshot.com
22:15 Andy5_ got me! lose
22:17 glusterbot New news from newglusterbugs: [Bug 1089758] KVM+Qemu + libgfapi: problem dealing with failover of replica bricks causing disk corruption and vm failure. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089758>
22:20 JoeJulian There shouldn't be anything about libgfapi that makes that any different than using the fuse client in regard to that bug.
22:21 gojk joined #gluster
22:22 JoeJulian In fact... I think there might have been reports of this bug over fuse before but nobody could explain their issue in a way that this could be isolated.
22:29 Andy5_ It took a few days to start pinpointing the problem. It's also subtle, as if you migrate the VM to another node, it will reestablish connections properly.
22:30 Andy5_ JoeJulian: thanks for suggesting to lookup the gfid directly. I was never 100% sure replication was finished.
22:30 JoeJulian Probably because it reopens the FD.
22:31 chirino joined #gluster
22:31 Andy5_ yes. it's a new process on a new machine. it starts from scratch.
22:31 rahulcs joined #gluster
22:31 marcoceppi joined #gluster
22:31 marcoceppi joined #gluster
22:33 JonnyNomad left #gluster
22:33 JoeJulian Seems like it should be able to be simulated by opening a file, locking it similarly, and doing the same process.
22:35 badone joined #gluster
22:39 JoeJulian cleverfoo: Are these active cinder volumes?
22:40 Andy5_ there is also something with the client versions. qemu was compiled using dev libraries from 3.4.2. the server is 3.4.3 but in the logs they agree on on protocol 3.3.0
22:40 JoeJulian Andy5_: qemu isn't statically linked is it?
22:41 Andy5_ I don't think it's statically linked.
22:41 JoeJulian Unless you compiled it to be static yourself it's not.
22:42 JoeJulian And the 3.3.0 rpc protocol is correct.
22:42 Andy5_ Ah. ok then.
22:43 Andy5_ let's see what RH says. if confirmed, this should be quite a show stopper.
22:50 elico Is there any video on gluster sync resync heal and split brain issues teachings?
22:54 elico also about gluster bd volumes. are there any full instructions about it? thanks.
22:54 badone joined #gluster
22:55 JoeJulian elico: No videos that I've seen on those topics. I've been tempted but I'd want animations and my animation department (my son) has no interest in helping.
22:57 elico JoeJulian: So what do you think about a video?
22:57 marcoceppi joined #gluster
22:57 marcoceppi joined #gluster
22:58 elico What animations?
22:58 JoeJulian If I had animations I wouldn't have to explain it. ;)
22:58 JoeJulian Just "how it works" stuff.
22:58 elico OK
22:58 elico do you have any idea in mind?
22:59 elico an abstraction...
22:59 elico using libreimpress to do some moving drawings is kind of basic.. I think..
23:00 JustinClift joined #gluster
23:02 chirino joined #gluster
23:02 elico JoeJulian: if you have couple shapes in mind I think we might be able to create something. What do you think about it?
23:02 cleverfoo joined #gluster
23:04 JustinClift avati: Thanks for op :)
23:10 JustinClift joined #gluster
23:12 JustinClift joined #gluster
23:13 JustinClift joined #gluster
23:14 velladecin1 @ports
23:14 glusterbot velladecin1: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
23:14 badone joined #gluster
23:14 JustinClift joined #gluster
23:14 JoeJulian elico: If you want to put something together, I'd be happy to critique and add information.
23:19 JustinClift joined #gluster
23:20 marcoceppi joined #gluster
23:20 marcoceppi joined #gluster
23:20 mjsmith2 joined #gluster
23:22 elico JoeJulian: I don't have even an idea on what to do and how to do.
23:22 elico I know a bit about powerpoint and libreimpress which can might help for the thing.
23:22 elico Do you have any list of subjects which you think about that will fit into a moving slide?
23:23 plarsen joined #gluster
23:24 JoeJulian AFR, DHT, split-brain, heal, add-brick, remove-brick...
23:24 JoeJulian volume creation...
23:24 elico OK so there is a way to put it all together step by step
23:25 elico I have heard about AFR and DHT but have not understood how it fits with glustefs yet
23:25 JoeJulian Here's a page I started on that a long time ago: http://www.gluster.org/community/docum​entation/index.php/GlusterFS_Concepts
23:25 glusterbot Title: GlusterFS Concepts - GlusterDocumentation (at www.gluster.org)
23:26 JoeJulian combine that with information from...
23:26 JoeJulian @luck dht misses are expensive
23:26 elico it takes ages for me to load it :\
23:26 JoeJulian @lucky dht misses are expensive
23:26 glusterbot JoeJulian: http://joejulian.name/blog​/dht-misses-are-expensive/
23:27 JoeJulian dialup?
23:30 elico JoeJulian: dial-up dsl which sometimes drops speed from 15Mbps to about 0.2Mbps
23:30 JoeJulian eww
23:32 elico and the dsl provider wanted to sell me 40Mbps with 2-3Mbps upload... until the technician came by and told me that it's better with 15Mbps and 0.7 Mbps upload and wait with the high speeds..
23:35 mjrosenb joined #gluster
23:36 elico JoeJulian: nice DHT description. I assume it's the same with bittorent
23:42 mjrosenb joined #gluster
23:59 marcoceppi joined #gluster
23:59 marcoceppi joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary