Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-03-16

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 JoeJulian Yeah, that's where config management saves the day.
00:01 haomaiwa_ joined #gluster
00:11 JoeJulian syadnom: You were the one that mentioned the feature request. Looks like it's already there: http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Managing%20Volumes/#non-uniform-file-allocation
00:11 glusterbot Title: Managing Volumes - Gluster Docs (at gluster.readthedocs.org)
00:14 muneerse2 joined #gluster
00:20 plarsen joined #gluster
00:21 Merlin_ joined #gluster
00:28 d0nn1e joined #gluster
00:42 d0nn1e joined #gluster
00:44 Merlin_ joined #gluster
00:47 tessier volume sync: failed: Staging failed on 1b487ceb-1d3a-4d80-a162-a707ce885898. Error: 10.0.1.20, is not a friend
00:47 tessier This is a new one...
00:47 tessier 10.0.1.20 was a rejected peer. So I wiped its config except for uuid per the docs. And now it says it has no volumes. So I think a sync might fix it. Nope, not a friend.
00:47 tessier gluster peer status shows both brick servers, however.
00:48 tessier Ah, but they don't say peer in cluster. Instead they say State: Accepted peer request (Connected)
00:48 tessier I'm not sure if I'm getting this cluster more or less borked the more I hack on it.
00:50 tessier ah..restarting glusterd seems to have helpe.d
00:50 tessier Yeay, volumes are back.
01:00 atrius joined #gluster
01:01 haomaiwa_ joined #gluster
01:03 tessier hmm...but one volume is completely MIA.
01:05 Merlin_ joined #gluster
01:14 tessier Ugh...now it's back in peer rejected state. I have no clue what is going on here.
01:17 johnmilton joined #gluster
01:21 EinstCrazy joined #gluster
01:22 Merlin__ joined #gluster
01:23 haomaiwa_ joined #gluster
01:25 tessier I wonder if I just need to completely wipe out this brick server somehow and start over.
01:30 amye joined #gluster
01:37 Gaurav__ joined #gluster
01:39 Merlin_ joined #gluster
01:54 dlambrig_ joined #gluster
01:55 anmol joined #gluster
01:59 baojg joined #gluster
01:59 Merlin_ joined #gluster
01:59 JoeJulian tessier: stop glusterd, rsync /var/lib/glusterd/vols from a good server, start glusterd.
02:00 haomaiwa_ joined #gluster
02:01 haomaiwang joined #gluster
02:11 Lee1092 joined #gluster
02:19 Merlin_ joined #gluster
02:20 baojg joined #gluster
02:22 johnmilton joined #gluster
02:23 hchiramm joined #gluster
02:23 hchiramm_ joined #gluster
02:32 Merlin_ joined #gluster
02:32 harish joined #gluster
02:52 nangthang joined #gluster
02:52 Merlin_ joined #gluster
03:01 haomaiwa_ joined #gluster
03:04 natarej__ joined #gluster
03:10 baojg joined #gluster
03:12 Merlin_ joined #gluster
03:25 dannyb joined #gluster
03:25 overclk joined #gluster
03:29 calavera joined #gluster
03:34 Merlin_ joined #gluster
03:34 atinm joined #gluster
03:34 shubhendu joined #gluster
03:39 kkeithley1 joined #gluster
03:40 ramteid joined #gluster
03:43 baojg joined #gluster
03:44 dlambrig_ joined #gluster
03:46 anmol joined #gluster
03:49 shubhendu__ joined #gluster
03:54 Merlin_ joined #gluster
03:56 natarej__ joined #gluster
03:56 aravindavk joined #gluster
03:56 nbalacha joined #gluster
04:01 haomaiwa_ joined #gluster
04:01 nishanth joined #gluster
04:02 itisravi joined #gluster
04:03 post-fac1um joined #gluster
04:05 jiffin joined #gluster
04:08 dlambrig_ joined #gluster
04:10 kanagaraj joined #gluster
04:10 gem joined #gluster
04:14 itisravi joined #gluster
04:15 shubhendu joined #gluster
04:18 Merlin_ joined #gluster
04:19 ppai joined #gluster
04:27 Merlin_ joined #gluster
04:28 natarej joined #gluster
04:31 Gnomethrower joined #gluster
04:38 mowntan joined #gluster
04:38 mowntan joined #gluster
04:40 sakshi joined #gluster
04:42 kotreshhr joined #gluster
04:43 Merlin_ joined #gluster
04:51 nehar joined #gluster
04:52 RameshN joined #gluster
04:57 ndarshan joined #gluster
04:59 Merlin_ joined #gluster
05:01 anmol joined #gluster
05:01 haomaiwa_ joined #gluster
05:05 skoduri joined #gluster
05:08 calavera joined #gluster
05:08 Manikandan joined #gluster
05:12 pur joined #gluster
05:26 atalur joined #gluster
05:26 karthik___ joined #gluster
05:29 ashiq joined #gluster
05:32 Merlin_ joined #gluster
05:32 jiffin1 joined #gluster
05:33 skoduri_ joined #gluster
05:35 beeradb joined #gluster
05:45 Gaurav__ joined #gluster
05:46 Merlin_ joined #gluster
05:47 hgowtham joined #gluster
05:47 Apeksha joined #gluster
05:49 tessier JoeJulian: Now my node won't start at all. Logfile: http://fpaste.org/340662/58107354/
05:49 glusterbot Title: #340662 Fedora Project Pastebin (at fpaste.org)
05:51 arcolife joined #gluster
05:54 archit_ joined #gluster
05:57 skoduri joined #gluster
05:57 atinm tessier, let me take a look
05:58 atinm tessier, what's the output of ls /var/lib/glusterd/vols/9l/ ?
06:01 atinm tessier, I'll be back in few minutes
06:01 haomaiwa_ joined #gluster
06:03 Merlin_ joined #gluster
06:05 rafi joined #gluster
06:09 daMaestro joined #gluster
06:11 karnan joined #gluster
06:14 poornimag joined #gluster
06:16 kshlm joined #gluster
06:17 anil_ joined #gluster
06:18 spalai joined #gluster
06:18 Merlin_ joined #gluster
06:27 nangthang joined #gluster
06:31 Gnomethrower joined #gluster
06:33 muneerse joined #gluster
06:41 Merlin_ joined #gluster
06:42 mhulsman joined #gluster
06:43 tessier atinm: Just saw your messages...taking a look at 9l...
06:44 tessier atinm: If you are just looking for an ls -la here it is: http://fpaste.org/340688/58110658/
06:44 glusterbot Title: #340688 Fedora Project Pastebin (at fpaste.org)
06:45 tessier Looks reasonable to me
06:45 prasanth joined #gluster
06:46 mbukatov joined #gluster
06:47 atinm tessier, info file is of zero size
06:47 atinm tessier, did you run into space issue by any chance?
06:47 atinm tessier, which version of gluster it is?
06:48 tessier atinm: No, didn't run out of space. Gluster 3.7.0
06:49 tessier I've been having all kinds of trouble with this cluster. JoeJulian most recently suggested I sync the vols dir from a working brick server to this one.
06:49 tessier oh crap.
06:49 tessier You called it. /var is full on the other brick server.
06:50 atinm tessier, :)
06:51 atinm tessier, can you just delete 9l folder and restart glusterd?
06:51 tessier No wonder it's been so flaky.
06:53 tessier uh oh...I restarted glusterd after I cleared the disk being full just to ensure everything is making sense again and now glusterd won't start on the only brick server that was previously working! :(
06:53 tessier That was a bad idea. As now production data is inaccessible.
06:53 tessier Fortunately it's night. But now I can't sleep until this is running again.
06:54 tessier And now it is having the exact same error as before. So....how do I restore the vol file? :(
06:55 vmallika joined #gluster
06:55 Merlin_ joined #gluster
06:57 bhuddah joined #gluster
06:57 * tessier wonders how it ever became zero
06:59 kshlm tessier, Does either server have non-zero files?
06:59 kshlm in /var/lib/glusterd
07:01 tessier Hmm....the one I just restarted doesn't hvae non-zero files.
07:01 tessier But it still complains about the vol file...let me paste that log...
07:01 haomaiwa_ joined #gluster
07:02 tessier http://fpaste.org/340693/81117121/
07:02 glusterbot Title: #340693 Fedora Project Pastebin (at fpaste.org)
07:02 tessier oh, we're looking at info files...
07:03 tessier info files are identical on both sides.
07:03 atalur joined #gluster
07:04 tessier So how can I reconstruct the info files?
07:04 tessier Not all are zero length but the same ones on each side are.
07:05 tessier Why would the info files ever be moved or recreated such that a full disk would do this anyway?
07:05 tessier ah...volume 9j which is the most important one with the only real production data on it still has its info file.
07:08 arcolife joined #gluster
07:09 hchiramm_ joined #gluster
07:10 Merlin_ joined #gluster
07:10 hchiramm joined #gluster
07:11 cholcombe joined #gluster
07:11 archit_ joined #gluster
07:12 atinm tessier, you mean to say the info file is of zero size on other node as well?
07:12 tessier Correct
07:12 kshlm tessier, SO the log you pasted indicates that the info file got truncated at the end some how.
07:12 atinm tessier, so that means you ran out of space in other node too?
07:13 kshlm You could compare the info for 9l and 9j , to see if something is missing.
07:13 tessier atinm: No...didn't run out of space on the other node. While trying to figure out what was going on it was suggested that I rsync the vols files from one node to the other node, before we realized some files were truncated.
07:13 tessier So the zero length files got rsync'd over. :(
07:13 JoeJulian srry
07:14 tessier JoeJulian: Not your fault. I should have caught the full disk sooner.
07:14 JoeJulian I did say to rsync from the "good one" to the "bad one" ;)
07:14 kshlm Ouch!
07:14 tessier JoeJulian: Turns out they were both bad in a way. :)
07:15 tessier Really, I don't care if I have to blow up the whole cluster, all the volumes and everything, it's just two nodes and one client I care about at the moment. I just want to get that one volume mounted back on the only client. I can get the other volumes up later.
07:15 JoeJulian tessier has a good point. New info files should be written to a temporary file and moved in to place rather than being overwritten.
07:16 tessier By blow up the cluster I mean not lose the data but just re-establish all of the peers, uuids, etc. Whatever is most expedient.
07:16 atinm JoeJulian, we do that, we create a tmp file, write into it and then rename it to the original file, that's how it works, we need to know how did we end up in a zero byte info file
07:17 atinm kshlm, IIRC we saw similar issue in other setup as well, I think there is some bug there in the code
07:17 JoeJulian Well then... that's a horse of a different color.
07:17 tessier There may be some code path which doesn't create the tmp file.
07:17 tessier I would be happy to reproduce it...on another set of hardware that I have.
07:17 tessier I've done a ton of monkeying around with this trying to get these glitches ironed out. All of the glitches caused the log volume to explode which somehow led to zero length info files.
07:19 tessier The peers files were all zero length too
07:19 kshlm tessier, That's what is surprising to me. The code as it is, shouldn't be doing this.
07:19 Merlin_ joined #gluster
07:19 kshlm The temp file & rename was introduced to avoid these issues.
07:20 tessier This is gluster 3.7.0 fwiw
07:20 tessier I know 3.7.6 or so is out, will upgrade if I can ever get this thing stable again.
07:21 unlaudable joined #gluster
07:23 jtux joined #gluster
07:28 tessier So...any ideas? Anyone do paid gluster consulting? At this point I should probably find some real professional help as this is getting silly. I've spent way too much time on this only to end up with it getting more and more broken the more I hack on it. I was fairly impressed that up until this point the data had never actually become unavailable.
07:35 [Enrico] joined #gluster
07:36 kshlm tessier, I'll try to help.
07:36 kshlm Your 9l volume is broken right?
07:37 kshlm Can you paste the partial info file that you have in a paste server?
07:37 tessier kshlm: Thanks. Various volumes are broken.
07:37 tessier Sure
07:37 kshlm How many volumes do you have?
07:38 Merlin_ joined #gluster
07:39 tessier http://fpaste.org/340709/13938145/
07:39 glusterbot Title: #340709 Fedora Project Pastebin (at fpaste.org)
07:39 tessier That shows all of my volumes, which ones have broken info files, and the contents of a good info file
07:40 post-factum joined #gluster
07:43 kshlm Okay. GlusterD should restore volumes in alphabetical order.
07:44 tessier ok...so we get 9a fixed first?
07:44 kshlm Your previous log snippet showed that 9l failed to restore, so I thought everything till 9l were good.
07:45 kshlm The list you posted, which system was it from?
07:45 kshlm The one that was working correctly?
07:45 tessier The above paste is from 10.0.1.21 aka disk10 which was working until I restarted glusterd after the disk had filled and ended up with 0 length info files.
07:46 [diablo] joined #gluster
07:46 kshlm So from both the systems, how many non-zero info files do you have?
07:47 tessier 3 non-zero info files. The same ones on each system.
07:50 kshlm Ah, this will be harder.
07:50 kshlm Just to make sure that the non-zero ones work, take a backup of /var/lib/glusterd/.
07:51 tessier ok...
07:51 tessier Done
07:51 kshlm Then delete every other volume in /var/lib/gluster/vols/ other than the ones with the info files.
07:51 kshlm Start glusterd.
07:51 Wizek joined #gluster
07:52 kshlm If glusterd starts then these are good.
07:52 kshlm Just do it on one system now.
07:54 tessier http://fpaste.org/340718/81148331/
07:54 glusterbot Title: #340718 Fedora Project Pastebin (at fpaste.org)
07:54 tessier No go. Check out that tmp file...
07:55 Merlin_ joined #gluster
07:55 tessier I don't see any zero length files in 9d or 9j other than quota.conf
07:55 tessier So maybe if we also remove 9b it would then work...
07:56 hgichon0 joined #gluster
07:56 tessier Nope... http://fpaste.org/340719/81149621/
07:56 glusterbot Title: #340719 Fedora Project Pastebin (at fpaste.org)
07:56 tessier What is the "management" volume?
07:56 deniszh joined #gluster
07:56 kshlm That's glusterd.
07:57 kshlm Okay. So brick resolve failed.
07:58 kshlm When you rsynced, did you rsync over the whole of /var/lib/glusterd or was it just /var/lib/glusterd/vols
07:58 tessier Just the vols
07:59 kshlm Okay. Good to know.
08:00 kshlm Can you start glusterd in with debug logs. Just run `glusterd -LDEBUG`
08:00 kshlm THat should help identify why brick resolution failed
08:01 tessier ok...
08:01 haomaiwa_ joined #gluster
08:02 tessier http://fpaste.org/340723/11534814/
08:02 glusterbot Title: #340723 Fedora Project Pastebin (at fpaste.org)
08:02 tessier Hope that's enough of the log...
08:04 fsimonce joined #gluster
08:06 kshlm tessier, Could you check if /var/lib/gluster/peers has any zero files?
08:06 jri joined #gluster
08:06 kshlm Now that I think about it, if it had zero files, GlusterD should have failed even earlier.
08:07 kshlm So, glusterd is failing to identify the IP 10.0.1.20 . I hope this is the IP for the other server.
08:07 kshlm Can you see if any file in /var/lib/glusterd/peers contains this?
08:07 tessier Yes, it does have zero length files for the peers.
08:07 tessier And yes, 10.0.1.20is the IP for the other server
08:08 kshlm Oh!
08:08 kshlm Okay. Let's recreate that first.
08:08 kshlm Give me couple of minutes. I need find a sample file.
08:09 tessier ok...
08:10 DV joined #gluster
08:12 kshlm http://fpaste.org/340731/14581159/
08:12 glusterbot Title: #340731 Fedora Project Pastebin (at fpaste.org)
08:12 kshlm So the peerinfo file in /var/lib/glusterd/peers should look like the 1 I pasted.
08:13 kshlm Just replace the uuid and hostname with the correct ones for the other server
08:13 kshlm The uuid should be the same as the filename
08:13 kshlm tessier, ^
08:13 tessier ok...just a moment...
08:14 tessier How do I get the uuid of the other server?
08:15 tessier ah...I bet it's in glusterd.info
08:15 kshlm Yup.
08:16 kshlm If you had an empty file, it's name would be the uuid
08:16 tessier Done. And glusterd started!
08:16 unlaudable joined #gluster
08:16 kshlm Cool. This means two of your volumes work.
08:16 tessier gluster volume status shows not online though.
08:17 kshlm Do a `gluster volume start <name> force` and try status again.
08:17 tessier ah...now it says online.
08:18 kshlm Awesome.
08:18 tessier And the client is happy and sees the data again!
08:18 kshlm Can you do the same on the other server? Just to make sure that the volumes work.
08:19 kshlm Oh cool.
08:19 tessier Sure
08:20 kshlm You should just be having the two volumes in /var/lib/glusterd/vols after this.
08:20 kshlm We'll continue to try and restore the others after this.
08:21 Wizek joined #gluster
08:22 kbyrne joined #gluster
08:24 tessier Ok, the other machine is happy now also and sees the same volumes. I noticed that the peer file on the other machine had the IP of its pair but the uuid was wrong. I corrected that. That could have been part of my problem the last couple of days also.
08:24 tessier Now they are both in the cluster and show both volumes up and happy.
08:24 bowhunter joined #gluster
08:26 kshlm Cool. Now let's start with recovering the other volumes.
08:27 kshlm Which one would like to recover most?
08:27 kshlm We'll attempt it first and then go ahead with the others.
08:28 tessier All of the others are of equal importance so let's go with 9a
08:28 kshlm Okay. So you have an empty info file.
08:29 TvL2386 joined #gluster
08:29 kshlm I have 2 questions for you now.
08:29 kshlm 1. What sort of a volume was it?
08:29 kshlm 2. Did you set any volume options on it.
08:30 EinstCrazy joined #gluster
08:30 nangthang joined #gluster
08:32 tessier 1. I'm not sure how to answer that...what sorts of volumes are there? It was replicated twice. Once on 10.0.1.20 and on 10.0.1.21
08:32 tessier Just like the others. They are all pretty much identical as far as gluster config goes. Created exactly the same.
08:32 tessier 2. No special options were set. All defaults.
08:39 robb_nl joined #gluster
08:42 tessier Now glusterfsd is cranking away at healing that volume. It's been broken for several days,
08:42 kshlm joined #gluster
08:42 gowtham joined #gluster
08:43 kshlm tessier, Sorry. My internet dropped off.
08:43 tessier No problem.
08:43 bhuddah joined #gluster
08:43 kshlm So all your volumes were nearly identical?
08:45 kshlm Should be easy to recreate the info files.
08:46 tessier Yep
08:46 dlambrig_ joined #gluster
08:49 kshlm Okay. So lets try this.
08:49 JoeJulian rm -rf /
08:49 kshlm Create the 9a directory in /var/lib/glusterd/vols
08:49 tessier heh
08:50 kshlm copy the info file from 9j into it.
08:50 kshlm Since you said they're identical, all you'll need to correct will be the volume-id and the bricks in the info file.
08:51 kshlm You should be able to get the volume-id from the brick path.
08:51 mhulsman joined #gluster
08:51 tessier ok...
08:52 Merlin_ joined #gluster
08:54 tessier get the volume-id from the brick path? I don't follow that...
08:55 tessier There's also a username and password in the info file also which look like uuids
08:56 ctria joined #gluster
08:56 kshlm `getfattr -m .volume-id -d -e hex  <path to a 9a brick>`
08:57 tessier ah, right. Forgot about that.
08:57 kshlm That shold get you the volume-id. You'll have to correctly format it as a uuid.
08:57 kshlm The username password shouldn't matter a lot. Even if they're the same across volumes.
08:57 kshlm That is used to allow things like self-heal daemon to connect to the bricks.
08:58 tessier Ok, done.
08:59 kshlm You'll also need to recreate the brickinfo files in /var/lib/gluster/vols/9a/bricks
08:59 kshlm Again, use 9j as the reference.
09:00 kshlm The brickinfo files will be named as a comibnation of the the hostname and brick path.
09:00 kshlm In the actual files itself, you'll need to correct the path and hostname.
09:01 kshlm Set the port to 0. GlusterD will select a proper non-clashing port for the brick when the brick port is zero.
09:01 haomaiwa_ joined #gluster
09:04 tessier ok, got the info and brickinfo files done
09:04 tessier Changed a lot of j's to a's
09:05 ppai joined #gluster
09:05 social joined #gluster
09:07 tessier Now I need to restart glusterd. But there's a heal happening. Is that a bad idea or should it pick up where it left off?
09:07 kshlm ALso change the `status` in the vol info file to `status=2`. This makes the volume stopped. GlusterD will not attempt to start it.
09:08 tessier ok
09:08 kshlm Just restarting glusterd shouldn't  affect the self-heal daemon.
09:08 kshlm But I don't know if glusterd will connect back to the running daemon.
09:09 kshlm So you might not be able to track progress.
09:09 kshlm How much healing do you have left?
09:09 Merlin_ joined #gluster
09:09 kshlm `gluster vol heal <name> info` should help.
09:09 Jules- joined #gluster
09:10 tessier A lot. 36 50G files.
09:10 JoeJulian restarting glusterd seems to restart the heal daemon
09:12 kshlm JoeJulian, Good to know. GlusterDs become too large to remember what happens inside.
09:12 kshlm JoeJulian, Would you also know if restarting healing will pick up from where it left?
09:14 JoeJulian That I'm not sure of. I think it does try to pick up where the lock is.
09:15 Jules- What is the best way to put a two Node Replicated GlusterFS into Maintenance, let's say you need to shut one node down for maintenance. Temp Disabling the Quorum? Like setting: cluster.server-quorum-type none, cluster.quorum-type fixed, cluster.quorum-count 1 and revert that after second node back from maintenance?
09:16 JoeJulian "best"... add an arbiter.
09:17 JoeJulian Ok, I really can't be awake any longer.
09:17 tessier JoeJulian: Thanks, goodnight!
09:17 * JoeJulian drops the keyboard and walks off stage.
09:17 Jules- JoeJulian: sure but how, without an arbiter :-)
09:17 kshlm GN JoeJulian
09:18 tessier [2016-03-16 09:18:09.106545] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/9a/bricks/10.0.1.20:-gluster-a-brick, returned error: (No such file or directory)
09:18 tessier Hmm
09:18 dannyb joined #gluster
09:18 kshlm tessier, So you restarted Glusterd?
09:19 tessier ah....maybe I need to make the bricks dir...
09:19 tessier Yes
09:20 tessier I think I need to recreate those bricks/ files too...
09:20 kshlm Where did you create the brick info files?
09:20 kshlm I mentioned that the files in bricks also need to be recreated.
09:21 Merlin_ joined #gluster
09:21 kshlm ""
09:21 kshlm You'll also need to recreate the brickinfo files in /var/lib/gluster/vols/9a/bricks
09:21 kshlm Again, use 9j as the reference.
09:21 kshlm The brickinfo files will be named as a comibnation of the the hostname and brick path.
09:21 kshlm In the actual files itself, you'll need to correct the path and hostname.
09:21 kshlm Set the port to 0. GlusterD will select a proper non-clashing port for the brick when the brick port is zero.
09:22 kshlm """
09:22 tessier Yep, done. And it started!
09:22 tessier Ok, I see what I need to do for the rest now.
09:22 kshlm Now you need to do a `gluster volume reset 9a`. That will regenerate all the volfiles.
09:22 kshlm You can then start the volume.
09:23 mhulsman joined #gluster
09:23 kshlm I hope this works out for you. I'm keeping my fingers crossed.
09:23 kshlm Good night tessier.
09:24 tessier Thanks! Goodnight!
09:24 kshlm Jules-, If you have quorum enabled, you'll need to disable it. Setting it to none will do it.
09:25 kshlm You can then kill all gluster processes on the node you're doing maintenance on.
09:25 kshlm WHen you restart gluster on the node, you'll need to wait for self-heal to complete before attempting maintenance on the other node.
09:30 hackman joined #gluster
09:32 nishanth joined #gluster
09:35 kovshenin joined #gluster
09:35 rafi1 joined #gluster
09:39 ramky joined #gluster
09:45 Merlin_ joined #gluster
09:52 rafi joined #gluster
10:00 Merlin_ joined #gluster
10:01 haomaiwa_ joined #gluster
10:04 spalai joined #gluster
10:15 Merlin_ joined #gluster
10:16 dlambrig_ joined #gluster
10:25 dannyb joined #gluster
10:36 gowtham joined #gluster
10:37 Merlin_ joined #gluster
10:40 spalai joined #gluster
10:40 [Enrico] joined #gluster
10:42 anti[Enrico] joined #gluster
10:48 Merlin_ joined #gluster
10:50 skoduri joined #gluster
10:50 anil_ joined #gluster
10:55 johnmilton joined #gluster
11:01 haomaiwa_ joined #gluster
11:04 Gnomethrower joined #gluster
11:09 nottc joined #gluster
11:20 nehar joined #gluster
11:24 haomaiwa_ joined #gluster
11:25 nehar_ joined #gluster
11:37 andy-b joined #gluster
11:42 ira_ joined #gluster
11:51 Merlin_ joined #gluster
11:51 the-me joined #gluster
11:52 Lee1092 joined #gluster
11:54 pjrebollo joined #gluster
11:54 chirino_m joined #gluster
11:55 wrouesnel1 joined #gluster
11:56 wrouesnel1 is it possible to control a glusterd with the cli tool remotely?
11:56 tbm_2 joined #gluster
11:57 wrouesnel1 ...yes, yes it is (always ask irc 1 link away from your answer)
12:00 shubhendu joined #gluster
12:01 Saravanakmr joined #gluster
12:01 poornimag joined #gluster
12:02 rastar joined #gluster
12:03 ppai_ joined #gluster
12:04 dlambrig_ joined #gluster
12:08 skoduri_ joined #gluster
12:10 Merlin_ joined #gluster
12:14 rastar joined #gluster
12:16 harish joined #gluster
12:18 surabhi joined #gluster
12:23 Merlin_ joined #gluster
12:24 EinstCrazy joined #gluster
12:26 unclemarc joined #gluster
12:28 kotreshhr left #gluster
12:30 arcolife joined #gluster
12:31 Slashman joined #gluster
12:37 poornimag joined #gluster
12:38 dlambrig_ joined #gluster
12:39 Saravanakmr joined #gluster
12:42 Merlin_ joined #gluster
12:43 kdhananjay joined #gluster
12:44 ppai_ joined #gluster
12:47 rafi joined #gluster
12:47 karnan_ joined #gluster
12:50 jiffin joined #gluster
12:53 Merlin_ joined #gluster
12:56 csaba joined #gluster
13:02 skoduri joined #gluster
13:06 Saravanakmr joined #gluster
13:06 nishanth joined #gluster
13:11 Merlin_ joined #gluster
13:16 ndarshan joined #gluster
13:18 dlambrig_ joined #gluster
13:21 anmol joined #gluster
13:26 skoduri joined #gluster
13:29 haomaiwa_ joined #gluster
13:38 robb_nl joined #gluster
13:40 hamiller joined #gluster
13:41 jiffin1 joined #gluster
13:42 liibert joined #gluster
13:43 spalai left #gluster
13:45 skoduri_ joined #gluster
13:45 liibert hi all, does somebody has  experience running openvz simfs vm directly on glusterfs and/or is using distributed-disperse (13 x (4 + 2) = 78) volume over +20 hosts with 3 brick in every host
13:46 [Enrico] joined #gluster
13:47 Merlin_ joined #gluster
13:55 skylar joined #gluster
14:00 nbalacha joined #gluster
14:01 haomaiwa_ joined #gluster
14:02 amye joined #gluster
14:03 hchiramm_ joined #gluster
14:05 Merlin__ joined #gluster
14:06 hchiramm joined #gluster
14:06 p8952_ joined #gluster
14:15 rafi1 joined #gluster
14:16 hgichon joined #gluster
14:24 coredump joined #gluster
14:25 hchiramm joined #gluster
14:26 Saravanakmr_ joined #gluster
14:27 nishanth joined #gluster
14:29 Merlin_ joined #gluster
14:29 ndarshan joined #gluster
14:31 baojg joined #gluster
14:35 rafi joined #gluster
14:39 calavera joined #gluster
14:40 shyam joined #gluster
14:42 Merlin_ joined #gluster
14:50 farhorizon joined #gluster
14:52 hchiramm_ joined #gluster
15:00 ndarshan joined #gluster
15:00 nathwill joined #gluster
15:01 haomaiwa_ joined #gluster
15:04 rwheeler joined #gluster
15:10 nehar joined #gluster
15:15 jdarcy joined #gluster
15:16 Merlin_ joined #gluster
15:17 robb_nl joined #gluster
15:17 [Enrico] joined #gluster
15:17 nbalacha joined #gluster
15:23 shyam joined #gluster
15:34 Merlin_ joined #gluster
15:42 hgichon0 joined #gluster
15:43 NTmatter joined #gluster
15:44 spalai joined #gluster
15:47 Merlin_ joined #gluster
15:49 rastar joined #gluster
15:50 rastar joined #gluster
16:00 d0nn1e joined #gluster
16:01 haomaiwa_ joined #gluster
16:03 shyam joined #gluster
16:03 Manikandan joined #gluster
16:07 rastar joined #gluster
16:13 deniszh joined #gluster
16:15 Merlin_ joined #gluster
16:19 chirino joined #gluster
16:19 skoduri joined #gluster
16:25 nishanth joined #gluster
16:26 ggarg joined #gluster
16:44 bennyturns joined #gluster
16:53 Merlin_ joined #gluster
16:58 prasanth joined #gluster
17:01 haomaiwa_ joined #gluster
17:06 Merlin_ joined #gluster
17:10 jiffin joined #gluster
17:11 lliibert joined #gluster
17:12 Lauri__ joined #gluster
17:13 Apeksha joined #gluster
17:20 Apeksha joined #gluster
17:22 DV joined #gluster
17:23 spalai joined #gluster
17:23 Merlin_ joined #gluster
17:29 sage joined #gluster
17:29 hchiramm joined #gluster
17:31 hchiramm_ joined #gluster
17:32 karnan joined #gluster
17:35 calavera joined #gluster
17:49 ivan_rossi left #gluster
17:56 Merlin_ joined #gluster
17:59 Merlin_ joined #gluster
18:01 haomaiwa_ joined #gluster
18:07 shubhendu joined #gluster
18:21 hackman joined #gluster
18:25 nishanth joined #gluster
18:29 rastar joined #gluster
18:34 liibert joined #gluster
18:36 hagarth joined #gluster
18:37 Gaurav__ joined #gluster
18:39 spalai joined #gluster
18:44 karnan joined #gluster
18:51 robb_nl joined #gluster
18:54 ovaistariq joined #gluster
19:00 Merlin_ joined #gluster
19:01 haomaiwang joined #gluster
19:10 amye joined #gluster
19:36 spalai left #gluster
19:46 JoeJulian post-factum: I agree with your arbiter email assessment.
19:54 shlant joined #gluster
19:55 shlant hi all. I had gluster fs installed and working, then I deleted everything and uninstalled it. Now when I go to reinstall, it won't start and the logs give https://gist.github.com/MrMMorris/898644d433cf5efb4fa5
19:55 glusterbot Title: gist:898644d433cf5efb4fa5 · GitHub (at gist.github.com)
19:56 shlant I saw after uninstalling that /var/run/glusterd.socket was still there so I tried deleting that too and still the same thing
19:57 JoeJulian Was your intention to delete the configuration?
19:58 calavera joined #gluster
19:58 rafi joined #gluster
19:59 JoeJulian Uninstalls don't delete application state (stuff in /var/lib). If your intention was to start from scratch, you'll need to delete the stuff under /var/lib/glusterd.
19:59 shlant ah ok
19:59 shlant will try that
20:01 haomaiwa_ joined #gluster
20:08 cliluw joined #gluster
20:09 cliluw joined #gluster
20:11 post-factum JoeJulian: thanks for the support :)
20:11 post-factum JoeJulian: any extra thoughts on that?
20:24 gbox joined #gluster
20:27 shaunm joined #gluster
20:29 deniszh joined #gluster
20:30 hackman joined #gluster
20:33 gbox Hi I noticed some Input/Output errors on writes.  gluster volume heal gv0 info has produced a list of 1000+ gfid so far.  How unusual is this?
20:38 JoeJulian post-factum: Just wondering what 256 byte inodes might look like for that. And, by the same token, 1k inodes.
20:39 post-factum JoeJulian: I create 512 byte inodes, as doc suggests
20:39 post-factum JoeJulian: for the tests as well
20:39 JoeJulian Sure. That's why I'm curious.
20:39 post-factum JoeJulian: should I try 1k inodes instead?
20:40 JoeJulian gbox: If you've had an outage, it's not unusual. During normal day-to-day operations, fairly unusual.
20:41 JoeJulian post-factum: Doesn't hurt to try. My expectation is that disk usage will go up despite inode usage going down.
20:41 post-factum JoeJulian: ok, will check that
20:41 JoeJulian post-factum: and with 256, I'm curious if inode usage will stay close to the same while disk usage goes down.
20:42 post-factum JoeJulian: ok
20:43 bennyturns joined #gluster
20:46 rauchrob joined #gluster
20:51 bennyturns joined #gluster
20:58 rauchrob Hey, I'm running a two node test setup with Centos7.2 and Gluster 3.7.8. My bricks are hardware RAID5 devices with a standard XFS connected over 10GbE. I have a raw read/write throughput of approx 500MB/s on the individual bricks. When creating a distributed volume over those bricks, and mounting via Gluster-FUSE on a Client System, I get a Write Performance over 500MB/s. However, when using a similar replica-2 volume, my write speed drop down to aroun
20:58 rauchrob 40MB/s. The same holds true if both bricks are on the same single System, which is of course not optimal, but shows that this is not a networking issue. Any idea on what's wrong would be great! :(
20:58 gbox JoeJulian: thanks, no outage.  I copied about 10TB of files over (all sizes).  Should I look at a particular log file?
20:59 JoeJulian gbox: If you know when it started, I'd check the client logs around that time.
21:00 hgichon joined #gluster
21:01 haomaiwang joined #gluster
21:05 gbox A whole lot of "remote operation failed [Transport endpoint is not connected]"
21:06 JoeJulian "Transport endpoint not connected" is usually because either the port isn't listening, it's firewalled, or there's a network issue.
21:06 toppy joined #gluster
21:07 JoeJulian Check "gluster volume status" "gluster peer status" and see if you can open a tcp connection to the brick ports (I use nc or telnet).
21:09 toppy We currently use a Samba share at work as a general purpose data file server... and we have two office locations. The samba share is located in one office. The other office access the samba share over VPN... however, this is slow. I'm wondering if glusterfs would be a solution to put our data file server in both office locations for faster access?
21:09 shaunm joined #gluster
21:10 petanb joined #gluster
21:12 chirino joined #gluster
21:13 JoeJulian not really, unless one of them can be read-only.
21:14 JoeJulian What you need is a quantum entangled network so you can have no latency across distances.
21:14 toppy JoeJulian: and what is the reason that glusterfs isn't suitable for this?
21:14 toppy lol
21:15 JoeJulian Latency. Gluster's replication is done from the client and writes are synchronous to each replica.
21:15 JoeJulian So you're still going to get hit with your latency problem during writes and lookup().
21:16 toppy from the client? ahh.. I didn't know that... so client is writing to each replica?
21:16 JoeJulian Correct
21:16 toppy but reading would still be much quicker...
21:16 JoeJulian There's a new journal-based replication that's being worked on that might give you all that, and a pony.
21:17 toppy what's that system called?
21:17 JoeJulian Yes, reads should come from the local replica.
21:17 JoeJulian Ok, I'm going to tell you, but you can't let anyone know I told you in this channel....
21:17 JoeJulian ready???
21:17 toppy maybe we could tolerate slower writes... what about file locking?
21:17 JoeJulian It's glusterfs.
21:17 toppy lol
21:17 toppy so... still to come....
21:17 JoeJulian Yeah, locks would probably suck too.
21:18 JoeJulian eager-locking might work for you.
21:18 JoeJulian I've always had the best luck with samba just turning off all the locking.
21:19 toppy really need a solution to essentially have fast access to our file server in two locations
21:19 JoeJulian http://www.gluster.org/community/documentation/index.php/Features/new-style-replication
21:19 JoeJulian Use google docs instead of office. ;)
21:20 toppy well, it's not just word docs and such... lots and lots of different filetypes
21:21 JoeJulian Yeah, I'm being facetious anyway. It's a tough problem to solve and nobody's solved it in any really good way.
21:21 toppy I've looked at xtreemfs before
21:21 toppy couldn't really get it working right a few years ago when I tried
21:21 harish joined #gluster
21:21 JoeJulian Yeah, that's what I've heard about it.
21:22 JoeJulian It might be worth trying out cephfs, too. With a properly built crushmap it might work. They've just recently announced that cephfs is supposed to be production ready.
21:22 toppy ah.. ok
21:23 JoeJulian "recently announced" and "supposed to be" worry me.
21:23 JoeJulian And, of course, any time a developer is the one that certifies it as production ready.
21:23 toppy Important CephFS currently lacks a robust ‘fsck’ check and repair function. Please use caution when storing important data as the disaster recovery tools are still under development. For more information about using CephFS today, see CephFS for early adopters
21:23 toppy lol
21:26 JoeJulian Oh, toppy, you're practically a neighbor. :D
21:27 JoeJulian I'm so used to talking around the globe, it's kind-of a novelty when another local pops in.
21:27 toppy where are you at?
21:27 JoeJulian Edmonds
21:28 toppy ah.. yes.. you are very close
21:28 toppy I'm just across the water
21:29 rauchrob Hey, I'm currently testing a two node test setup with Centos7.2 and Gluster 3.7.8 connected over 10GbE. My bricks are hardware RAID5 devices with a standard XFS. I have a raw read/write throughput of approx 500MB/s on the individual bricks. When creating a distributed volume over those bricks, and mounting via Gluster-FUSE on a Client System, I get a write performance of around 500MB/s, as expected. However, when using a similar replica-2 volume, my
21:29 rauchrob write speed drop down to around 40MB/s. The same holds true if both bricks are on the same single System, which is of course not optimal, but shows that this is not a networking issue. Any idea what's wrong would be great, I've been struggeling with this for about a week by now...
21:31 JoeJulian rauchrob: I'm guessing it's but 1309462
21:31 JoeJulian er, bug 1309462
21:31 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1309462 low, unspecified, ---, ravishankar, MODIFIED , Upgrade from 3.7.6 to 3.7.8 causes massive drop in write performance.  Fresh install of 3.7.8 also has low write performance
21:33 rauchrob JoeJulian: thanks for the hint...
21:34 toppy pull a bug number out like that? impressive
21:34 JoeJulian Looks like performance.write-behind off and data-self-heal off were needed.
21:34 JoeJulian toppy :) Thanks.
21:34 JoeJulian I can't take all the credit, though. That's the bugzilla plugin to supybot.
21:35 gbox JoeJulian: ports check out OK. client just fails to write to one peer.  It's a 2 replica, distributed volume.  I assume only one peer has a copy?
21:35 JoeJulian gbox: yes
21:36 gbox The client is a peer (the replica of the other peers)
21:37 JoeJulian Is that client not connecting to itself or to it's peer?
21:37 johnmilton joined #gluster
21:37 JoeJulian ~pastestatus | gbox
21:37 glusterbot gbox: Please paste the output of gluster peer status from more than one server to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
21:37 rauchrob JoeJulian, so
21:37 JoeJulian no, that's not what I wanted.
21:38 lkoranda joined #gluster
21:38 rauchrob JoeJulian: I will check out the workaround. If I unterstand it correctly this Bug will be fixed in 3.7.9
21:38 JoeJulian gbox: I was thinking gluster volume status (from one peer is sufficient).
21:38 JoeJulian rauchrob: That's my understanding, yes.
21:38 rauchrob JoeJulian: so I'm going to check out the suggested workaround...
21:39 cyberbootje joined #gluster
21:42 gbox http://fpaste.org/341237/raw/
21:43 gbox gluster volume heal gv0 info has produced 790783 entries (all from today)
21:43 rauchrob JoeJulian: It looks like it works! Great!
21:46 csaba joined #gluster
21:49 JoeJulian gbox: which machine is this happening on, and is it failing to connect to itself or its peer?
21:50 gbox JoeJulian: painrepo.  I think it's failing to connect to another peer
21:50 ctria joined #gluster
21:50 JoeJulian Show one whole error line.
21:50 gbox The errors state client3_3_lookup_cbk but the client seems to be 0-gv0-client-2
21:50 JoeJulian That's not an error.
21:51 JoeJulian client-2 is the third brick (numbering from 0)
21:51 DV joined #gluster
21:51 JoeJulian So see if you can "telnet gluster-cns4 49152"
21:52 JoeJulian (from the machine that's throwing the ETRANS)
21:52 JoeJulian s/ETRANS/ENOTCONN/
21:52 glusterbot What JoeJulian meant to say was: (from the machine that's throwing the ENOTCONN)
21:53 rauchrob JoeJulian: What are the implications of turning of performance.write-behind and data-self-heal off in a production environment?
21:54 rauchrob ... we are currently evaluating Gluster as oVirt Backend
21:54 johnmilton joined #gluster
21:56 gbox Connection to gluster-cns4 49152 port [tcp/*] succeeded!
21:59 gbox The ports seem OK
21:59 shyam joined #gluster
21:59 JoeJulian rauchrob: I prefer turning off the self-heals in production. That leaves them to the self-heal daemon instead of allowing clients to do them.
22:00 JoeJulian gbox: then I guess I would try unmounting and remounting.
22:00 JoeJulian There's that or there's restarting the brick on cns4.
22:00 JoeJulian restarting the brick is more likely the solution.
22:00 gbox just restarting glusterd?
22:01 rauchrob I can't find any documentation on these volume options, though...
22:01 JoeJulian gbox: No, glusterd is just the management daemon.
22:01 JoeJulian gbox: that 'volume status' command showed you the pid for the brick.
22:01 haomaiwa_ joined #gluster
22:02 JoeJulian http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Managing%20Volumes/#tuning-options
22:02 glusterbot Title: Managing Volumes - Gluster Docs (at gluster.readthedocs.org)
22:02 gbox Ha, the process with that PID does not appear to be running
22:03 JoeJulian Then what's listening on that port?
22:04 gbox It's there I had sshd into another server on that terminal
22:04 JoeJulian hehe, I hate when I do that.
22:04 rauchrob JoeJulian: I have already had a look at these, but the Parameters performance.write-behind and data-self-heal are not explained
22:05 JoeJulian rauchrob: Huh, well I'll be.... "gluster volume set help"
22:05 rauchrob Ah... thanks!
22:06 JoeJulian I'm not sure if I love it or hate it when the application help text is better than the documentation.
22:07 rauchrob Can you suggest some book on Gluster for the Ops people? :)
22:08 JoeJulian I'm the wrong guy to ask. I'm not a big fan of print for this stuff. It moves so fast that by the time a book is written, it's out of date.
22:18 ilbot3 joined #gluster
22:18 Topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
22:19 gbox JoeJulian: You had great intuition on this.  Restarting the brick worked.
22:20 gbox I didn't restart glusterd though, and I see this on the problem peer: W [socket.c:588:__socket_rwv] 0-management: readv on /var/run/gluster/3560b2713654a96df485cba4a415b1d2.socket failed (Invalid argument)
22:21 dabukalam left #gluster
22:34 JoeJulian Ah yes, you should have let glusterd start the brick. "volume start...force" or restarting glusterd. I would see if restarting glusterd fixes that.
22:35 toppy left #gluster
22:38 gbox It did, now seeing a lot of "Gfid mismatch detected for ... Skipping conservative merge on the file."
22:39 robb_nl joined #gluster
22:39 gbox Ah 'gluster volume start gv0 force'
22:46 JoeJulian gfid mismatches mean the files on both bricks were assigned a gfid (the equivalent of an inode number) separately. That could mean split-brain. Gluster will figure out if it can fix it. Look for split-brain notifications in "gluster volume heal $vol info"
22:49 gbox Running "gluster volume heal $vol info" earlier listed 790783 entries, but none indicated split-brain.  Is it possible there is simply no gfid on the brick that had been zombied?  That would still be a mismatch.
22:51 gbox Lots of self-heal log entries on the client.  Does the client need to access (read/write) a file to initiate self-heal?
22:52 JoeJulian It wouldn't have known if they were split-brain before. It couldn't check the missing brick.
22:52 JoeJulian No client needed. There's an index that the self-heal daemon will crawl.
22:53 gbox Yes that makes sense.  Gluster functions as expected.  Is there a way to watch for that zombie peer problem?
22:56 gbox Ah I saw this all morning: W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 45877734: LOOKUP() /data/repo_push_dir.log => -1 (Input/output error)
22:58 gbox That log file is written to a lot and it's on the problematic peer.
23:01 haomaiwang joined #gluster
23:15 johnmilton joined #gluster
23:40 hagarth joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary