Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2015-08-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 JoeJulian right
00:03 RedW joined #gluster
00:04 primehaxor @JoeJulian when i start glusterd with nfs kernel stopped i receive clnt_create: RPC: Program not registered
00:05 JoeJulian I wish ndevos was around. He knows this part better.
00:09 JoeJulian That's what I was looking for: rpcinfo | nc termbin.com 9999
00:09 cyberbootje joined #gluster
00:10 jermudgeon So I stopped all gluster process on a 2-node cluster (glusterd, glusterfsd, and glusterfs). Updated to 3.7. Started. bitrot enable says I’m still at 30600
00:16 jermudgeon Is the split-brain preventing update of the cluster version?
00:16 jermudgeon It has literally two files on it, one split.
00:17 jermudgeon Same mod times, so I think it’s just a metadata split, not a data split.
00:17 JoeJulian jermudgeon: gluster volume set all cluster.op-version 30701
00:18 JoeJulian You can hash the file (or a reasonably large chunk of it) to compare them. I'll often hash head and tail of a file if it's too large to do the whole thing reasonably.
00:20 jermudgeon Will do. Version set worked. Thanks
00:32 cholcombe joined #gluster
00:35 jermudgeon OK, so how do I make splitbrain obsolete? :) I’m still showing a split-brain file, heal is enabled.
00:37 JoeJulian s/splitbrain/splitmount/
00:37 glusterbot What JoeJulian meant to say was: jermudgeon: no. That page has new features as of 3.7 that will make splitmount obsolete.
00:38 * JoeJulian mumbles about documentation again...
00:39 jermudgeon Ah ha. OK. So I still need to resolve this manually. MD5 did show differences. It’s a fs image file, and should have a journal, so I’ll make two copies and mount them loop and check, unless you have a better idea.
00:39 jermudgeon My eyes glazed over a bit about comparing attribs.
00:39 jermudgeon It’s too late in the day for that, I suspect
00:39 JoeJulian The funny part is... that's pranith's document and he's always harping on documentation.
00:41 JoeJulian Somebody needs to learn the difference between "table of contents" and "index" sheesh.
00:43 JoeJulian jermudgeon: https://github.com/gluster/glusterfs​/blob/release-3.7/doc/features/heal-​info-and-split-brain-resolution.md
00:43 glusterbot Title: glusterfs/heal-info-and-split-brain-resolution.md at release-3.7 · gluster/glusterfs · GitHub (at github.com)
00:45 nangthang joined #gluster
00:48 jermudgeon JoeJulian: thanks.
00:53 jdossey joined #gluster
00:55 primehaxor joined #gluster
01:05 prg3 joined #gluster
01:11 dlambrig joined #gluster
01:14 dlambrig_ joined #gluster
01:19 RedW joined #gluster
01:20 davidself joined #gluster
01:36 harish joined #gluster
01:39 Lee1092 joined #gluster
01:45 dlambrig joined #gluster
01:55 haomaiwa_ joined #gluster
02:02 MugginsM joined #gluster
02:09 nangthang joined #gluster
02:10 haomaiwang joined #gluster
02:14 bennyturns joined #gluster
02:17 nangthang joined #gluster
02:27 harish joined #gluster
02:30 sankarshan joined #gluster
02:41 MugginsM joined #gluster
02:41 dlambrig joined #gluster
02:47 haomaiwa_ joined #gluster
02:54 RameshN joined #gluster
02:55 bharata joined #gluster
02:59 haomaiwa_ joined #gluster
03:01 haomaiwang joined #gluster
03:01 baojg joined #gluster
03:02 mjrosenb one of my clients doesn't seem to want to talk to all of my bricks
03:02 mjrosenb us it possible to get it to re-establish contact without unmounting the filesystem?
03:03 JoeJulian I've not had much luck with that. You might be able to if you change a volume setting that changes the client vol file, like a buffer size or something.
03:03 JoeJulian That will cause it to reload the vol file.
03:03 JoeJulian for me: file a bug
03:03 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
03:05 shyam joined #gluster
03:05 mjrosenb I could also just wait until I have some free time tomorrow, and upgrade all of my machines from 3.3 to 3.6
03:09 JoeJulian +1
03:10 JoeJulian Though I was talking to Pranith yesterday at Linuxcon. He would recommend going straight to 3.7.3.
03:11 harish joined #gluster
03:12 dlambrig joined #gluster
03:17 harish joined #gluster
03:18 haomaiwang joined #gluster
03:19 vmallika joined #gluster
03:23 pdrakeweb joined #gluster
03:24 baojg joined #gluster
03:25 plarsen joined #gluster
03:33 [7] joined #gluster
03:37 MugginsM joined #gluster
03:38 nbalacha joined #gluster
03:39 atinm joined #gluster
03:40 shubhendu joined #gluster
03:46 spcmastertim joined #gluster
03:48 baojg joined #gluster
03:51 ppai joined #gluster
03:52 sakshi joined #gluster
03:57 itisravi joined #gluster
04:07 kkeithley1 joined #gluster
04:13 dlambrig joined #gluster
04:15 sripathi joined #gluster
04:20 nishanth joined #gluster
04:24 jiffin joined #gluster
04:24 hgowtham joined #gluster
04:25 ashiq joined #gluster
04:25 meghanam joined #gluster
04:27 deepakcs joined #gluster
04:33 javi404 joined #gluster
04:34 meghanam joined #gluster
04:52 arcolife joined #gluster
04:55 gem joined #gluster
05:09 spalai joined #gluster
05:13 skoduri joined #gluster
05:15 ndarshan joined #gluster
05:15 vimal joined #gluster
05:17 spalai left #gluster
05:24 poornimag joined #gluster
05:25 arcolife joined #gluster
05:31 pppp joined #gluster
05:31 kotreshhr joined #gluster
05:31 topshare joined #gluster
05:32 kdhananjay joined #gluster
05:33 karnan joined #gluster
05:37 Bhaskarakiran joined #gluster
05:40 pppp joined #gluster
05:41 Zhang joined #gluster
05:42 jdossey joined #gluster
05:44 atalur joined #gluster
05:49 cvstealth joined #gluster
05:52 Manikandan joined #gluster
05:52 jdossey joined #gluster
05:53 sankarshan_away joined #gluster
05:53 sankarshan joined #gluster
05:57 jwd joined #gluster
06:03 baojg joined #gluster
06:06 Manikandan joined #gluster
06:08 raghu joined #gluster
06:10 maveric_amitc_ joined #gluster
06:18 dlambrig joined #gluster
06:22 ashiq- joined #gluster
06:22 ashiq- joined #gluster
06:24 vimal joined #gluster
06:25 gletessier joined #gluster
06:26 skoduri joined #gluster
06:27 kkeithley1 joined #gluster
06:27 jtux joined #gluster
06:28 gletessier_ joined #gluster
06:28 gletessier_ joined #gluster
06:31 nangthang joined #gluster
06:36 jcastill1 joined #gluster
06:42 jcastillo joined #gluster
06:45 karnan joined #gluster
06:47 nishanth joined #gluster
06:49 Bhaskarakiran joined #gluster
07:03 ashiq- joined #gluster
07:04 arcolife joined #gluster
07:05 ashiq- joined #gluster
07:08 baojg joined #gluster
07:14 vmallika joined #gluster
07:15 sripathi joined #gluster
07:20 gem joined #gluster
07:24 mlhess joined #gluster
07:32 kkeithley1 joined #gluster
07:40 Alex31 rastar_afk: yop ... when you are back, i have some few question :)
07:41 rastar Alex31: I am here for short while before I go for lunch
07:41 Alex31 rastar: which country you are ?
07:41 rastar Alex31: India
07:41 Alex31 rastar: great
07:42 Alex31 rastar: so, i'm going to be short
07:42 Alex31 rastar: you're right, the bandwith and response time was the problem
07:43 Alex31 rastar:  I search to reduce this problem because I will have an architecture with agency where the bandwith is 2 or 4 Mbps
07:44 rastar Alex31: Yes, more important is the response time
07:44 kshlm joined #gluster
07:44 Alex31 is it possible to activate an option where when I start a copy from WIndows on the GFS volume, the files are copied on a cache and after, replicated on all the nodes ?
07:44 RameshN joined #gluster
07:45 rastar Alex31: what is the round trip time between 1. samba server and other gluster nodes 2. between windows machine and Samba server?
07:46 Alex31 rastar:  round trip time ?
07:46 Alex31 rastar: a ok ... thks you google translate
07:47 Slashman joined #gluster
07:47 Alex31 France => Tunisia, the ping is 47ms
07:48 kshlm joined #gluster
07:48 sahina joined #gluster
07:49 rastar Alex31: ok, so france is where Samba server is(Fedora 22) and Tunisia is where other Gluster bricks(Debian) are?
07:49 rastar Alex31: windows machine is in France?
07:49 haomaiwa_ joined #gluster
07:51 Alex31 rastar: for my test, I use only Debian server in 3.6 version. I have 3 bricks in France and one in Tunisia. my goal is to replicate the file on each sites of my company for each user have a good response time when they want to work on a shared file
07:51 Alex31 so samba server are on each brick
07:52 Alex31 rastar: do you think I have to grow up the performance.write-behind-window-size
07:52 Alex31 rastar: actually at 32MB
07:53 rastar Alex31: You were right yesterday...you need geo-rep for such setup.
07:53 Alex31 rastar: but geo replication is only on one way no ?
07:53 rastar And even then it is eventual consistency between the two sites
07:53 rastar Alex31: yes, it is master/slave.. not master/master
07:54 Alex31 rastar: is there a mechanism for read on the slave copy and write on the master ?
07:55 rastar Alex31: I don't understand the requirements of geo-rep well..
07:56 rastar Alex31: probably you should talk to kotreshhr
07:56 Alex31 rastar:me too ....  with geo replication, if the HQ replicate on 4 sites. What happen when a user of one of site open a document, modify it and save it  ? where the document is saved ....
07:57 Alex31 rastar: ok, i'm going to ask kotreshhr :)
07:57 rastar Alex31: in geo-rep the document is saved on all bricks of master site immedietely
07:58 rastar Alex31: and it will be copied to slave site in next few minutes..few mins is proportional to ping times and bandwidth between master and slave sites.
07:58 rastar Alex31: basic setup will  involve create a volume with all bricks in France and then creating another volume with bricks in Tunisia.
07:59 rastar Alex31: One will be master and other will be slave.
07:59 hgowtham joined #gluster
08:00 rastar Alex31: From analysis of yesterday's data, If you are using SMB, please ensure that ping times between "windows machine and samba server" and between "samba machine and all gluster servers" is less than or around  1 ms.
08:01 rastar Alex31: otherwise you will keep seeing performance problems with SMB.
08:01 ctria joined #gluster
08:02 Alex31 rastar: Ok, I'm going to have a look on all of that
08:02 haomaiwa_ joined #gluster
08:02 Alex31 rastar_away: thanks
08:03 primusinterpares joined #gluster
08:10 cppking joined #gluster
08:10 farblue joined #gluster
08:12 cppking I got a problem about  samba-vfs-glusterfs ,   when using this plugin , I can't mount volume of another replica node by  cifs when one replica node is down .   while native and nfs protocol can
08:14 farblue morning all :) I’m new to Gluster so I’m after some noob advice :) I’m thinking of using Gluster to create a unified storage layer across 5 servers that will be serving as a ‘cloud’ for docker container deployment. I’d like to support 2 out of the 5 machines being offline at any time. The 5 machines have 2 SSD drives each, either pairs of 60Gb 93 machines) or pairs of 120Gb (2 machines). I’ll need the OS installs on the machines to be RA
08:14 farblue mirrored. Could someone possibly help me understand the best approach for setting up gluster volumes across the machines?
08:14 farblue s/93/3/
08:14 glusterbot What farblue meant to say was: morning all :) I’m new to Gluster so I’m after some noob advice :) I’m thinking of using Gluster to create a unified storage layer across 5 servers that will be serving as a ‘cloud’ for docker container deployment. I’d like to support 2 out of the 5 machines being offline at any time. The 5 machines have 2 SSD drives each, either pairs of 60Gb
08:14 glusterbot 3 machines) or pairs of 120Gb (2 machines). I’ll need the OS installs on the machines to be RAI
08:24 Norky joined #gluster
08:34 ramky joined #gluster
08:34 kshlm joined #gluster
08:37 javor joined #gluster
08:50 atinm joined #gluster
08:55 sripathi joined #gluster
09:01 LebedevRI joined #gluster
09:01 cppking I got a problem about scsi plugin for gluster https://bpaste.net/show/9d7dc220c395
09:01 cppking build for EL6
09:01 glusterbot Title: show at bpaste (at bpaste.net)
09:02 haomaiwa_ joined #gluster
09:03 cppking ppai:  are you there ?
09:03 sripathi joined #gluster
09:03 ppai cppking, I am now
09:04 cppking https://bpaste.net/show/9d7dc220c395
09:04 glusterbot Title: show at bpaste (at bpaste.net)
09:05 cppking it's about scsi-plugin-glusterf
09:05 cppking build on EL6
09:06 ppai cppking, dlambrig may be able to help you out with it
09:06 hgowtham joined #gluster
09:06 ppai cppking, just across http://blog.gluster.org/tag/block/ posts and he's the author
09:07 cppking dlambrig:  are you there?
09:07 cppking thx ppai
09:07 s19n joined #gluster
09:08 gem joined #gluster
09:08 cppking ppai:  thx a lot
09:09 kotreshhr Alex31: To keep it brief, Whatever written in master volume(site1) will be replicated to slave volume (site2). slave is used as read only volume. As said, it is not master-master replication. Let me know if I can help u in anyway.
09:09 anil joined #gluster
09:20 skoduri joined #gluster
09:23 atinm joined #gluster
09:26 Alex31 kotreshhr: thanks you... So, I think, geo-repl is not the functionality for me...
09:27 Alex31 kotreshhr: is there a way for writing in cache before writing on the volume ?
09:27 baojg joined #gluster
09:28 Alex31 kotreshhr: my goal is to make more transparant for a user the time of saving a document on the GFS volume
09:28 Alex31 kotreshhr: I hope you understand me :)
09:29 kotreshhr Alex31: geo-rep is just an application over glusterfs which enables non synchronous replication between master and slave gluster volumes. It doesn't interfere the writes on gluster volume.
09:32 shubhendu joined #gluster
09:35 vmallika joined #gluster
09:36 vmallika joined #gluster
09:36 kshlm joined #gluster
09:37 vmallika joined #gluster
09:37 vmallika joined #gluster
09:38 vmallika joined #gluster
09:38 Alex31 kotreshhr: so, I'm going to stay in replication. Is there a way to perform the synchronisation time between bricks linked over wan ?
09:40 kotreshhr Alex31: sry replication meaning AFR?
09:40 vmallika joined #gluster
09:42 Alex31 kotreshhr:  mhmm, not sure what is AFR :)
09:43 Alex31 kotreshhr: with replication, I mean create the volume like this: gluster volume create SHARED_GFSVOL1 replica 3 drbd01:/mnt/SHARED_GFSVOL1  drbd03:/mnt/SHARED_GFSVOL1 drbd04:/mnt/SHARED_GFSVOL1 force
09:43 kotreshhr Alex31: :) that is AFR.
09:43 kaushal_ joined #gluster
09:43 Alex31 kotreshhr: ok :)
09:44 Alex31 kotreshhr: so, my test environnment is with 3 bricks in AFR. Two are on a local network Gigabits, and the three is a site over WAN accessible over VPN
09:46 Alex31 kotreshhr: the bandwith is not good... 2 or 4 mbps. I search the possibily the have a replication between this 3 nodes (or more )  with cache writing for making the time synchronisation transparant for the user
09:47 cyberbootje joined #gluster
09:47 Alex31 kotreshhr: do you think, performance.cache-size , performance.write-behind-window-size, are the good option to play with for having this result ?
09:49 kotreshhr Alex31: I don't understand AFR well. So I am not the right person to comment on it. I don't find AFR guys. Just shoot a mail to gluster-user list. They would definitely help you.
09:50 Alex31 kotreshhr: ok, i'm going to do this
09:50 Alex31 kotreshhr: thanks you :)
09:50 kotreshhr Alex31: Welc:)
09:56 gem joined #gluster
10:02 haomaiwang joined #gluster
10:02 kdhananjay joined #gluster
10:09 kaushal_ joined #gluster
10:11 gem_ joined #gluster
10:28 kotreshhr left #gluster
10:37 skoduri joined #gluster
10:52 ndarshan joined #gluster
10:52 jrm16020 joined #gluster
10:59 farblue hi all, I’d like to run by you guys my thoughts on how I could deploy gluster. I have 5 servers, each with 2 disks. I was going to partition each disk and create a raid setup for the OS. Then I thought I’d create a replicated or dispersed volume across all the ‘disk a’ disks on the servers and the same for ‘disk b’ disks and then combine them into a distributed set to create 1 overall storage space. Does that sound sensible?
11:02 haomaiwa_ joined #gluster
11:16 ndarshan joined #gluster
11:17 jrm16020 joined #gluster
11:19 ira joined #gluster
11:36 firemanxbr joined #gluster
11:37 jrm16020 joined #gluster
11:54 unclemarc joined #gluster
12:00 auzty joined #gluster
12:03 jrm16020 joined #gluster
12:03 jrm16020 joined #gluster
12:11 jtux joined #gluster
12:17 firemanxbr joined #gluster
12:20 RameshN joined #gluster
12:23 jrm16020 joined #gluster
12:27 B21956 joined #gluster
12:30 chirino joined #gluster
12:34 plarsen joined #gluster
12:49 k-ma joined #gluster
12:49 julim joined #gluster
12:49 ccha joined #gluster
12:52 pdrakeweb joined #gluster
12:59 sakshi joined #gluster
13:04 ashiq- joined #gluster
13:04 ashiq joined #gluster
13:06 siel joined #gluster
13:11 harish joined #gluster
13:16 shaunm joined #gluster
13:17 ashiq- joined #gluster
13:27 nangthang joined #gluster
13:33 _Bryan_ joined #gluster
13:33 DV joined #gluster
13:36 dgandhi joined #gluster
13:43 harold joined #gluster
13:52 fsimonce joined #gluster
13:55 MessedUpHare joined #gluster
13:55 Twistedgrim joined #gluster
13:57 shyam joined #gluster
13:57 spcmastertim joined #gluster
14:02 kanagaraj joined #gluster
14:04 pdrakeweb joined #gluster
14:12 sahina joined #gluster
14:15 nishanth joined #gluster
14:18 cholcombe joined #gluster
14:25 anil joined #gluster
14:25 calisto joined #gluster
14:26 haomaiwa_ joined #gluster
14:28 sahina joined #gluster
14:31 pdrakeweb joined #gluster
14:37 sahina joined #gluster
14:39 cglcd joined #gluster
14:40 cglcd Hey guys - I've got a medium (1.2 TB) cluster with ~3k split-brained files here
14:40 cglcd and most of the files are in directories which are also split-brained
14:40 cglcd (gluster 3.4)
14:41 cglcd if I fix the files, does this fix the directories?
14:42 cglcd if I fix the directories by clearing the changelogs, it forcibly copies the files from one brick to the other
14:43 _maserati joined #gluster
14:46 mckaymatt joined #gluster
14:57 mckaymatt joined #gluster
15:01 neofob joined #gluster
15:02 haomaiwa_ joined #gluster
15:03 bennyturns joined #gluster
15:04 cyberswat joined #gluster
15:06 jbrooks joined #gluster
15:11 shyam joined #gluster
15:22 tturkleton joined #gluster
15:24 tturkleton Hey folks. I have a question about healing replicated bricks in a scenario where there is a catastrophic failure of one of the nodes. I've noticed that the full heal kills performance. Can this be partially mitigated by rsyncing the data from one brick to the other before the heal?
15:43 chuz04arley_ joined #gluster
15:44 pdrakeweb joined #gluster
15:55 jcastill1 joined #gluster
15:58 vmallika joined #gluster
16:00 jcastillo joined #gluster
16:02 haomaiwa_ joined #gluster
16:11 jdossey joined #gluster
16:16 JoeJulian No
16:16 JoeJulian tturkleton: ^
16:16 Xtreme gluster is such a resource hog
16:16 JoeJulian mmkay
16:20 JoeJulian cglcd: No, fixing the files does not fix directories. Typically directory split-brains are just metadata and I just cleared them (back in the 3.4 days) by resetting the trusted.afr attributes to all zero on both bricks for those directories.
16:21 JoeJulian tturkleton: what version are you running?
16:21 sblanton joined #gluster
16:23 cyberswat joined #gluster
16:23 JoeJulian wrt "gluster is such a resource hog" idle resources are wasted resources.
16:24 muneerse joined #gluster
16:27 tturkleton 3.6
16:28 tturkleton JoeJulian: ^
16:28 JoeJulian latest?
16:28 tturkleton 3.6.4
16:29 JoeJulian Mmm. I thought I'd seen a patch addressing queue hogging.
16:29 tturkleton It seems that the heal operation essentially cripples all reads from the working replica
16:30 JoeJulian Where are you bound? cpu? io?
16:30 JoeJulian memory?
16:30 tturkleton I'll kick it off quick, just killed one of the storage nodes, and I'll let you know :)
16:34 ramky joined #gluster
16:34 Lee1092 joined #gluster
16:36 tturkleton Have to re-provision it first, won't actually be as quick as originally thought
16:37 tturkleton Is it better to do a replace-brick or add brick and remove the other brick?
16:37 tturkleton I didn't find any documentation on replacing a brick that failed catastrophically for 3.6, so I was following 3.4 documentation
16:38 tturkleton My current method for healing consists of probing the new node, running a replace-brick start, committing the change, running a full heal, and detaching the old peer
16:41 tturkleton Does that sound accurate, JoeJulian? Thank you for answering my questions btw. :)
16:43 JoeJulian unfortunately, yes.
16:44 Zhang joined #gluster
16:44 JoeJulian Though you can skip the start and just go straight to "commit force"
16:44 tturkleton Alright, thanks :)
16:46 ramky joined #gluster
17:01 neofob left #gluster
17:02 haomaiwang joined #gluster
17:04 tturkleton JoeJulian: Do you recommend running gluster volume heal <VOLNAME> full in this example?
17:04 JoeJulian I do
17:04 tturkleton I believe that's the only real option
17:04 tturkleton k
17:05 JoeJulian If you're cpu bound, change the self-heal algorithm to full. That'll keep it from computing block hashes.
17:05 tturkleton Okay, it is indeed CPU bound
17:05 tturkleton I keep seeing spikes to 80-100%
17:05 tturkleton single core system (trying to be cost-efficient)
17:05 JoeJulian In theory, it shouldn't, but I'm not sure if that's true.
17:06 tturkleton Where's that configured?
17:06 JoeJulian Eww... single core. :(
17:06 tturkleton Yeah...
17:06 JoeJulian gluster volume set help
17:06 tturkleton Deployed at AWS with an m3.medium
17:06 JoeJulian look for algorithm
17:07 tturkleton Sweet, thanks :)
17:07 JoeJulian You're using ephemeral storage?
17:07 trav408 joined #gluster
17:08 tturkleton EBS
17:09 tturkleton But using AutoScaling groups, so if a node dies, it deletes both the EC2 instance and the EBS behind it
17:09 rafi joined #gluster
17:09 JoeJulian Then why would you need to heal? Spin up a new vm, set the uuid correctly in /var/lib/glusterd/glusterd.info, mount your ebs volume, start glusterd.... nevermind...
17:10 JoeJulian That just sounds like a disaster waiting to happen, imho.
17:11 Zhang joined #gluster
17:12 tturkleton Also, in order to do that, I'd also have to make sure the IP or hostname was the same for the peer connection, yeah? Or would replacing the UUID and setting the new peer be sufficient? That does sound like a disaster waiting to happen for sure. :-/
17:12 JoeJulian You would, but keeping consistent hostnames is easy using their service.
17:12 JoeJulian I forget what it's called. I don't use amazon.
17:12 JoeJulian I know too many people that work there.
17:14 tturkleton hahaha
17:15 tturkleton Even after changing that config setting, it's thrashing CPU pretty hard
17:15 moss joined #gluster
17:15 tturkleton It makes sense, so it sounds like in order to get better performance out of the heal operation, we just need more CPU cycles and there isn't much we can do in the way of tuning, yes?
17:16 JoeJulian Right. You might be able to use cgroups to limit the self-heal daemon.
17:16 moss I am having an issue on Ubuntu 14.04-LTS (gluster installed from ubuntu repos) where GlusterFS refuses to mount on boot. Also, if i adjust my FQDN, gluster daemon wont even start.  Can someone offer me some help?
17:16 tturkleton Which would definitely be a plausible alternative, but on the same token, we want recovery to be somewhat quick, so if we were to do that, it would lengthen the process of the heal
17:16 JoeJulian Though it sounds like it may just be reading the directory listings that's maxing out your core.
17:17 JoeJulian moss: what version is that?
17:18 moss JoeJulian: 3.5.5
17:18 JoeJulian Oh, that's not so bad. I see "LTS" and I expect the worst. :D
17:18 moss JoeJulian: lol, thank god for that :)
17:19 moss So it seems that the daemon will not run, and I don't understand why
17:19 tturkleton Yeah. Our use case unfortunately involves a large number of small files and directories which will result in a ton of iteration
17:19 JoeJulian Did you change the hostname? The hostnames you used for the bricks have to be able to be resolved.
17:19 moss ah
17:19 moss yes
17:19 moss i did
17:20 moss okay
17:20 moss shoot. i dont remember what it was originally :(
17:20 JoeJulian You could add a dns alias or an entry in etc hosts
17:20 JoeJulian The hostnames are all in the volume definitions. You can find them in /var/lib/glusterd/vols
17:21 tturkleton JoeJulian: Do you recommend using hostnames over IP addresses? I've been using IP addresses instead of hostnames
17:21 JoeJulian @hostnames
17:21 glusterbot JoeJulian: Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others.
17:21 JoeJulian Absolutely.
17:21 moss JoeJulian: @hostnames ? There is nothing in here about hostnames
17:21 moss I see IP's
17:21 JoeJulian They're much more flexible.
17:22 JoeJulian Ah, did those change?
17:22 moss the ip's? no
17:22 JoeJulian In fact, I prefer using short hostnames. They can always be aliased with search parameters to dns, or /etc/hosts entries.
17:23 moss well, i cant probe it if the daemon wont run
17:24 moss i have 2 servers that are both serving as servers and clients
17:24 moss for highly available web servers
17:24 JoeJulian Wait... glusterd won't start either way? I thought you were saying that glusterd was starting but the brick was not.
17:24 moss nope
17:24 moss glusterd wont start
17:25 calisto joined #gluster
17:25 JoeJulian try " glusterd --debug  | nc termbin.com 9999 " and share the link if you'd like me to lend my own eyes to the analysis.
17:26 moss Initialization of volume 'management' failed, review your volfile again
17:26 moss it output a LOT of stuff
17:26 moss heh
17:26 JoeJulian yeah, some of it's false errors too (I hate that)
17:26 moss CMA: unable to get RDMA device list
17:26 wushudoin| joined #gluster
17:27 JoeJulian Are you using rdma?
17:27 moss donno
17:27 moss oh
17:27 moss i should mention
17:27 moss this is on a VPS
17:27 JoeJulian not using rdma then
17:28 moss this debug is not very helpful
17:29 moss can i just manually change the hostname = in bricks?
17:29 moss actually
17:29 moss that wont even help
17:29 moss heh
17:29 moss if glusterd is not running
17:30 moss glusterd is thinking im a friend
17:30 moss not local
17:30 moss :\
17:31 wushudoin| joined #gluster
17:32 tturkleton Hrm... JoeJulian, this may be crazy, but would this work? Create a snapshot of the existing EBS volume and attach that to the new host instead of the one that's built when it AutoScales to recover, add and heal that way?
17:34 JoeJulian yes
17:35 tturkleton Is there a reason that would work and an rsync of the brick wouldn't?
17:35 JoeJulian extended attributes
17:35 tturkleton I'm setting up the brick as a blank EBS mounted to /bricks/<HOSTNAME> and creating a brick subdirectory, then replicating that (/bricks/<HOSTNAME>/brick)
17:35 tturkleton Ah
17:36 JoeJulian Oh, I misread
17:36 JoeJulian I thought you had a snapshot of the deleted brick.
17:36 tturkleton Ah, would need to be of the deleted brick, that makes sense
17:36 JoeJulian You don't want to copy the state of the healthy brick to the unhealthy one.
17:36 tturkleton *nod*
17:37 JoeJulian If you don't copy the xattrs, gluster will happily add them and then you'll have state conflict.
17:39 tturkleton So, if we had snapshots of the old brick, the process *could* look like build new EC2 instance with snapshot of old EBS volume, update hostname to match hostname of prior host, update IP address to point to new host in /etc/hosts which will result in peer reconnecting, and then running a heal operation to catch up diffs?
17:39 tturkleton That part will still hit CPU somewhat hard, but for a shorter period of time as the delta should be much smaller, if I understand correctly
17:40 moss This is insane
17:41 JoeJulian tturkleton: correct
17:45 moss JoeJulian: I got the daemon to boot. The daemon wont boot when I change the fqdn
17:45 moss that is the dumbest thing on earth.
17:45 moss if i adjust my /etc/hosts file it works
17:46 JoeJulian file a bug report
17:46 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
17:46 JoeJulian Personally, I see a lot dumber things on this planet on a daily basis. ;)
17:46 moss hahah
17:46 moss now the issue is
17:47 moss on reboot it doesnt mount
17:47 JoeJulian Welcome to debian
17:47 moss ubuntu :P
17:47 moss even worse
17:47 JoeJulian it's an upstart problem that is hit pretty rarely, but just often enough for me to forget how I fixed it...
17:48 moss I saw a few fixes on redhats website but they didnt work.
17:50 JoeJulian yeah, redhat uses either initd or systemd. Upstart was the reason Lennart took a vacation to write systemd. It was a piece of garbage.
17:50 moss lol
17:50 moss horrible.
17:51 tturkleton https://wpengine.atlassian.net/browse/NGI-181
17:51 glusterbot Title: Atlassian Cloud (at wpengine.atlassian.net)
17:51 tturkleton Whoops
17:51 tturkleton haha
17:51 tturkleton Wrong window *facepalm*
17:53 JoeJulian moss http://irclog.perlgeek.de/g​luster/2014-07-17#i_9039466
17:53 glusterbot Title: IRC log for #gluster, 2014-07-17 (at irclog.perlgeek.de)
17:54 moss hmm
17:54 moss you added sleep?
17:54 JoeJulian yeah
17:54 moss where
17:54 moss in the upstart script for gluster?
17:54 JoeJulian right
17:56 moss so
17:56 moss glusterfs-server.conf or mounting-glusterfs.conf
17:58 JoeJulian glusterfs-server
17:58 moss and i add it after the exec line?
17:58 JoeJulian It's its own section, if I understand correctly.
17:59 JoeJulian I also avoid upstart like the plague. I went to work for an ubuntu shop, and convinced them to change. They changed to archlinux, but I'm kind of digging it.
17:59 moss lol
17:59 moss I use gentoo personally, but this is for a client.
18:02 haomaiwa_ joined #gluster
18:06 skoduri joined #gluster
18:14 beeradb joined #gluster
18:27 moss JoeJulian: that doesn't work
18:34 JoeJulian I guess I'd check the client log to see why
18:58 edong23 joined #gluster
19:02 haomaiwa_ joined #gluster
19:14 cyberswat joined #gluster
19:20 tturkleton JoeJulian: Would adding a 3rd replicated brick help the heal process at all?
19:21 JoeJulian no
19:22 tturkleton Our thought process would be to have a 3rd replicated brick that essentially sits as a "healer". A new node would come online, be added to the peer list, ACLs dropped in place to block access to the production "read" node, a heal initiated so traffic would be *forced* to come from the 3rd replicated brick
19:22 tturkleton but that sounds somewhat crazy
19:22 tturkleton and quite dangerous
19:23 tturkleton The "healer" and new node would not be written or read from by the servers actually managing the content
19:24 JoeJulian Well in that case, don't make it a replica 3, just add a server from which you do the heal. Might work, I'm not sure if heals are distributed or run on the server from which the heal command was issued.
19:25 JoeJulian But if you add a brick, all clients connect to it.
19:25 JoeJulian Might not be a bad thing though. Might address your load problem and give you added resiliency.
19:26 tturkleton We're mounting like this: "172.16.3.174:/storage /nas glusterfs defaults,_netdev,backupvolfile-server=172.16.3.69 0 0" Will it still mount each of them?
19:26 tturkleton We're not using the storage file method
19:27 elico joined #gluster
19:27 tturkleton The idea being for a multi-AZ set up where it "prefers" the first AZ over the second AZ
19:27 JoeJulian @mount servers
19:27 glusterbot JoeJulian: I do not know about 'mount servers', but I do know about these similar topics: 'mount server'
19:27 JoeJulian @mount server
19:27 glusterbot JoeJulian: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns, or (#2) One caveat is that the clients never learn of any other management peers. If the client cannot communicate with the mount server, that client will not learn of any volume changes.
19:31 spcmastertim joined #gluster
19:32 tturkleton It seems that glusterbot has a lot of really good information
19:33 tturkleton Another issue I noticed that I'm gathering is more specific to our implementation than an issue with GFS is that using the aforementioned line in /etc/fstab, the GFS filesystem cannot be mounted if one of those two servers are offline. This would lead me to believe that the datastore file would be a better method of mounting it
19:35 JoeJulian except that you then cannot make live changes to your volume or use the management interface.
19:35 JoeJulian @rrdns
19:35 glusterbot JoeJulian: You can use rrdns to allow failover for mounting your volume. See Joe's tutorial: http://goo.gl/ktI6p
19:41 tturkleton Interesting
19:57 spcmastertim joined #gluster
20:00 beeradb joined #gluster
20:02 64MADNPL0 joined #gluster
20:12 mckaymatt joined #gluster
20:23 ipmango_ joined #gluster
21:01 haomaiwa_ joined #gluster
21:02 TheCthulhu1 joined #gluster
21:12 shortdudey123 joined #gluster
21:16 tturkleton Thanks for all your help JoeJulian :)
21:16 tturkleton We now have a couple ideas on potential approaches to solving this issue
21:19 chuz04arley joined #gluster
21:30 JoeJulian You're quote welcome.
21:30 JoeJulian s/quote/quite/
21:30 glusterbot What JoeJulian meant to say was: You're quite welcome.
21:31 JoeJulian "quote welcome" would be Red Hat.
21:32 shortdudey123 i am experimenting with glusterfs server node failure scenarios.  I have a 3 node relica and when 1 node reboots the mount on the clients becomes unusable for over a minute while the clients detect the failure.  Setting network.ping-timeout to 10s doesn't help much (client mount still hangs for 50 some seconds)
21:32 JoeJulian debian
21:32 shortdudey123 What am i missing? or what do i need to tune for the clients to handle failures faster
21:32 JoeJulian Am I right, or what?
21:33 shortdudey123 Haven't found much documentation on clients handling failed server nodes
21:33 JoeJulian Why they feel the need to stop the network at all is beyond me.
21:33 JoeJulian @ping-timeout
21:33 glusterbot JoeJulian: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
21:33 JoeJulian That said, a shutdown operation should not cause that.
21:34 PeterA joined #gluster
21:34 JoeJulian The problem is that debian is shutting down the network before all the processes are killed.
21:34 PeterA anyone getting quota issue with symlinks?
21:35 shortdudey123 I am running on centos
21:35 JoeJulian That network shutdown precludes the client from receiving the TCP FIN and elegantly closing the TCP connection. This causes a wait.
21:35 PeterA i patched to 3.5.4 and still get quota mismatch issue
21:35 shortdudey123 7.1 specifically
21:35 JoeJulian Well then that throws all that out the window... ;)
21:35 shortdudey123 :p
21:35 JoeJulian I haven't heard that for 7.1. What version are you running?
21:35 JoeJulian (of gluster)
21:36 shortdudey123 3.6.4
21:37 shortdudey123 same version for glusterfs-fuse
21:40 shortdudey123 If it helps, all 3 server nodes are m3.mediums in AWS
21:49 JoeJulian I'm seeing if I can repro on VMs locally.
21:49 shortdudey123 cool beans
21:50 doekia joined #gluster
22:02 haomaiwa_ joined #gluster
22:18 jdossey joined #gluster
22:18 Agoldman joined #gluster
22:20 Agoldman Hello all, I'm looking for some help with a problem i've got here. the short version is both bricks and the servers they live on ended up off at the same time
22:21 Agoldman one of the servers is back, and we are able to access the data.
22:21 Agoldman but the other server won't connect
22:22 Agoldman here's what I'm getting vor one of the bricks, same issue accross all of them.
22:22 Agoldman Brick vmos-of1-04:/data/.brick/admin-of1-01
22:22 Agoldman Status: Transport endpoint is not connected
22:22 Agoldman Brick vmos-of1-03:/data/.brick/admin-of1-01/
22:22 Agoldman Number of entries in split-brain: 0
22:22 JoeJulian Check the brick logs in /var/log/glusterfs/bricks for clues.
22:23 pdrakeweb joined #gluster
22:23 JoeJulian If you need to share something in the log ,,(paste)
22:23 glusterbot For a simple way to paste output, install netcat (if it's not already) and pipe your output like: | nc termbin.com 9999
22:26 Agoldman Hi Joe, my boss speaks highly of you. I'm seeing a log of DNS resolution fails for the server its self.
22:26 Agoldman It repeats about 20 times so I'll just paste the last line
22:26 Agoldman [2015-08-21 18:27:58.291732] E [name.c:242:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host vmos-of1-04
22:26 Agoldman [2015-08-21 18:27:58.292344] I [socket.c:3358:socket_submit_request] 0-glusterfs: not connected (priv->connected = -1)
22:26 Agoldman [2015-08-21 18:27:58.292370] W [rpc-clnt.c:1566:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (XID: 0x92 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
22:27 JoeJulian Well that would be the problem then. Those hosnames have to resolve.
22:27 JoeJulian s/hosn/hostn/
22:27 glusterbot What JoeJulian meant to say was: Well that would be the problem then. Those hostnames have to resolve.
22:28 Agoldman it has a /etc/hosts entry for itself
22:29 JoeJulian On the machine you posted that single error from, can you successfully "ping vmos-of1-04"
22:30 edwardm61 joined #gluster
22:41 Agoldman yes
22:41 Agoldman [root@vmos-of1-04 bricks]# ping vmos-of1-04
22:41 Agoldman PING vmos-of1-04 (127.0.1.1) 56(84) bytes of data.
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=1 ttl=64 time=0.032 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=2 ttl=64 time=0.030 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=3 ttl=64 time=0.031 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=4 ttl=64 time=0.030 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=5 ttl=64 time=0.031 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=6 ttl=64 time=0.031 ms
22:41 Agoldman 64 bytes from vmos-of1-04 (127.0.1.1): icmp_seq=7 ttl=64 time=0.031 ms
22:41 Agoldman ^C
22:41 Agoldman --- vmos-of1-04 ping statistics ---
22:41 Agoldman 7 packets transmitted, 7 received, 0% packet loss, time 5999ms
22:41 glusterbot Agoldman: -'s karma is now -346
22:41 Agoldman rtt min/avg/max/mdev = 0.030/0.030/0.032/0.007 ms
22:41 glusterbot Agoldman: -'s karma is now -347
22:42 JoeJulian @paste
22:42 glusterbot JoeJulian: For a simple way to paste output, install netcat (if it's not already) and pipe your output like: | nc termbin.com 9999
22:42 JoeJulian Please don't flood irc channels with paste.
22:42 Agoldman sorry…
22:42 JoeJulian 127.0.1.1
22:42 JoeJulian There's your problem.
22:43 JoeJulian I think...
22:45 Agoldman okay, I removed the 127… from the/etc/hosts, it will now ping itself as its network address.
22:46 Agoldman might I ask how you would recommend going about testing the getting it to reconnect?
22:46 JoeJulian Probably have to restart glusterd
22:49 Agoldman i just did on the issue server, i dont think it worked. do i need to do it for both servers?
22:49 JoeJulian shortdudey123: Yeah, I was able to repro. Now I can figure out what's happening and come up with a workaround.
22:50 JoeJulian I wouldn't think so.
22:50 JoeJulian hmm
22:51 Agoldman hang on, just ran peer status, on the server that is working, looks like it has another server's ip in under 04's name as well as the correct server
22:51 Agoldman ie it things that 04's ip is 172.31.31.32 as well as .34
22:52 JoeJulian ah, that might be a problem... ;)
22:52 shortdudey123 JoeJulian: thank you very much! i am connected to a znc bouncer to PM me when you have details and i will see it when i am on
22:53 JoeJulian (me too)
22:53 Agoldman I'd think so…
22:53 shortdudey123 so PM me*
22:53 Agoldman so detach it?
22:53 JoeJulian It won't let you if you have bricks that use that hostname.
22:54 JoeJulian You'll probably have to stop all glusterd and fix it directly in /var/lib/glusterd/peers
22:54 JoeJulian each server should not have a peer file for itself
22:54 Agoldman okay, I'll tak a look at that. Thanks.
23:02 haomaiwa_ joined #gluster
23:09 Agoldman Okay, the server is back in the loop and healing, thanks very much.
23:10 Agoldman as for the one that was claiming the wrong hostname, it's gluster is off for now, as it keeps trying to come back and recalim the wrong name. where is that information stored so I can edit it?
23:14 JoeJulian Servers don't know their own hostnames. They have a uuid in /var/lib/glusterd/glusterd.info.
23:14 jbautista- joined #gluster
23:14 JoeJulian Other servers store it in /var/lib/glusterd/peers
23:15 JoeJulian They infer their hostname by looking up bricks. Bricks have hostnames and when one resolves to itself, the server thinks that's it's name.
23:15 JoeJulian s/it's/its/
23:15 glusterbot What JoeJulian meant to say was: They infer their hostname by looking up bricks. Bricks have hostnames and when one resolves to itself, the server thinks that's its name.
23:17 Agoldman cool thanks
23:19 jbautista- joined #gluster
23:20 JoeJulian shortdudey123: Well that was easy. Enable glusterfsd.service.
23:40 highbass joined #gluster
23:40 highbass hey guys i am getting a bit confused with gluster and volume creation
23:40 highbass so i have 3 nodes that i have installed gluster on
23:40 highbass seperate partitions etc ( test purposes)
23:41 highbass how would a replication of 2 work with three nodes?
23:41 highbass or why would it not work?
23:42 JoeJulian highbass: https://joejulian.name/blog/how-to-expand-g​lusterfs-replicated-clusters-by-one-server/
23:42 glusterbot Title: How to expand GlusterFS replicated clusters by one server (at joejulian.name)
23:42 JoeJulian That image should kind-of tell the story you're looking for, I think.
23:42 JoeJulian The point is, you need a multiple of the replica count.
23:46 highbass o my god beautiful
23:47 shortdudey123 JoeJulian: already enabled -    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
23:49 mckaymatt joined #gluster
23:52 JoeJulian shortdudey123: reread that carefully
23:52 JoeJulian you missed two letters.
23:54 shortdudey123 ah
23:54 shortdudey123 let me look at that a sec
23:54 JoeJulian glusterd vs glusterfsd
23:55 shortdudey123 sudo systemctl | grep glusterfsd
23:55 shortdudey123 returns nothing
23:56 JoeJulian # rpm -q --whatprovides /usr/lib/systemd/system/glusterfsd.service
23:56 JoeJulian glusterfs-server-3.6.4-1.el7.x86_64
23:56 JoeJulian So it should be there.
23:56 JoeJulian Oh, right.
23:56 JoeJulian It wouldn't be there unless you enabled it.
23:56 shortdudey123 doh
23:57 JoeJulian systemctl status glusterfsd should show it disabled and dead.
23:58 shortdudey123 yup haha thanks!
23:59 shortdudey123 glusterfsd looks like it does cleanup of glusterd correct?

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary