Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2015-09-22

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 amye joined #gluster
00:13 gildub joined #gluster
00:15 morse_ joined #gluster
00:17 samppah joined #gluster
00:26 tru_tru joined #gluster
00:29 devilspgd joined #gluster
00:30 RedW joined #gluster
00:31 amye joined #gluster
00:49 nangthang joined #gluster
00:51 EinstCrazy joined #gluster
00:54 zhangjn joined #gluster
00:55 Mr_Psmith joined #gluster
01:02 night joined #gluster
01:12 shyam joined #gluster
01:25 Lee1092 joined #gluster
01:51 haomaiwang joined #gluster
02:01 17SADM64Z joined #gluster
02:07 kshlm joined #gluster
02:11 harish joined #gluster
02:14 Mr_Psmith joined #gluster
02:24 rafi joined #gluster
02:25 nangthang joined #gluster
02:38 skoduri joined #gluster
02:40 msciciel_ joined #gluster
02:43 baojg joined #gluster
02:45 Mr_Psmith joined #gluster
02:51 yangfeng joined #gluster
02:53 dgandhi joined #gluster
02:56 suliba joined #gluster
03:01 haomaiwang joined #gluster
03:02 Mr_Psmith joined #gluster
03:03 bharata joined #gluster
03:31 overclk joined #gluster
03:42 atinm joined #gluster
03:45 shubhendu joined #gluster
03:51 TheCthulhu joined #gluster
03:53 nbalacha joined #gluster
03:53 ppai joined #gluster
03:57 TheSeven joined #gluster
03:59 RameshN joined #gluster
03:59 kanagaraj joined #gluster
03:59 sakshi joined #gluster
04:00 DV joined #gluster
04:01 haomaiwang joined #gluster
04:03 itisravi joined #gluster
04:07 gem joined #gluster
04:10 atinm joined #gluster
04:10 jobewan joined #gluster
04:12 cabillman joined #gluster
04:13 DV joined #gluster
04:18 sankarshan joined #gluster
04:20 neha joined #gluster
04:21 TheCthulhu1 joined #gluster
04:21 yazhini joined #gluster
04:22 kshlm joined #gluster
04:32 hchiramm_home joined #gluster
04:44 yazhini joined #gluster
04:59 kotreshhr joined #gluster
04:59 jobewan joined #gluster
05:01 atalur joined #gluster
05:01 haomaiwa_ joined #gluster
05:03 dusmant joined #gluster
05:04 jcastill1 joined #gluster
05:05 deepakcs joined #gluster
05:05 nishanth joined #gluster
05:08 ndarshan joined #gluster
05:09 jcastillo joined #gluster
05:09 calavera joined #gluster
05:09 hgowtham joined #gluster
05:10 SOLDIERz joined #gluster
05:11 pppp joined #gluster
05:12 Bhaskarakiran joined #gluster
05:12 TheCthulhu1 joined #gluster
05:13 overclk joined #gluster
05:14 dusmant joined #gluster
05:20 vimal joined #gluster
05:26 ramky joined #gluster
05:28 overclk joined #gluster
05:36 Manikandan joined #gluster
05:37 mash333 joined #gluster
05:38 streppel joined #gluster
05:38 Apeksha joined #gluster
05:39 ashiq joined #gluster
05:39 owlbot` joined #gluster
05:42 hagarth joined #gluster
05:43 neha joined #gluster
05:44 streppel hey all. my glusterfs is working now, so my next step is to setup ctdb. i already tried to do so, but couldn't get it to work properly. i hope this is the right place for this kind of problem. i'm using 2 hosts with 1 public ip each. i don't want to use ip failover (right now), mainly because this would require the switches to be configured accordingly and that's out of my hand, and i don't want anyone to work on something while i'm just
05:44 streppel testing. my nodes-file contains each node (so both ips of the hosts). since i don't want to use failover i didn't configure the public_addresses file. the recovery lock file is on my shared gluster volume and is created correctly. one host (gluster1) starts ctdb as i would expect, but the other host (gluster2) logs taht it can't create the lock file and switches to unhealthy.
05:44 jiffin joined #gluster
05:45 rafi joined #gluster
05:53 Saravana_ joined #gluster
05:55 zhangjn_ joined #gluster
05:55 vmallika joined #gluster
05:57 streppel else, if i'm trying to keep in the ip-failover and force one node to use one ip it removed the currently assigned ip-address but right after that writes in the log "unable to bind to any of the node addresses".
05:58 nangthang joined #gluster
05:59 mhulsman joined #gluster
06:01 kdhananjay joined #gluster
06:01 64MADWJLN joined #gluster
06:07 zhangjn joined #gluster
06:07 maveric_amitc_ joined #gluster
06:12 rastar streppel: have you mounted the shared gluster volume which has the lock filie on both nodes?
06:19 ctria joined #gluster
06:22 Manikandan joined #gluster
06:28 overclk joined #gluster
06:28 onorua joined #gluster
06:31 enzob joined #gluster
06:33 enzob joined #gluster
06:36 skoduri joined #gluster
06:38 onorua joined #gluster
06:44 auzty joined #gluster
06:50 streppel rastar: yes i have
06:52 TheCthulhu joined #gluster
06:52 streppel rastar: filecreation works correctly in both directions, so gluster itself is working
06:54 rastar streppel: Ok if you have already configured in /etc/sysconfig/ctdb for CTDB_RECOVERY_LOCK=<pathOfLockFileOnLocalNode>  , then it seems like a ctdb issue
06:55 streppel rastar: just to make sure, the lockfile has to be on the gluster volume and be the same for all nodes, correct?
06:56 rastar streppel: yes
06:56 rastar all ctdb processes need to access the same file and they access it through local mount of glusterfs
06:57 streppel CTDB_RECOVERY_LOCK=/glustermount/lock/lockfile
06:57 streppel is what i have in my config
06:57 streppel the glustermount is mounted during startup via fstab
06:58 rastar streppel: Perfect, then all you need to check is if the mount succeeded on both nodes or not
06:58 streppel rastar: mounting succeeded on both nodes
07:01 haomaiwa_ joined #gluster
07:01 streppel http://termbin.com/q8pt
07:01 rastar streppel: Needs more debugging then.. recommended ctdb versions are >= 2.5
07:02 streppel rastar:this is what i get on node 1. node2 started ctdb correctly
07:02 streppel rastar: ctdb version is 4.2.3 on both machines
07:04 shubhendu joined #gluster
07:06 rastar streppel: I have not tried that version of ctdb yet..
07:06 rastar streppel: seems like a ctdb change of behavior , try asking on samba mailing list
07:08 streppel rastar: since it is not working at all right now i can easily switch to a version you'd recommend.
07:09 rastar streppel: http://download.gluster.org/pub/gluster/gl​usterfs/samba/CentOS/epel-6Server/x86_64/
07:09 glusterbot Title: Index of /pub/gluster/glusterfs/samba​/CentOS/epel-6Server/x86_64 (at download.gluster.org)
07:10 rastar streppel: you can find samba and ctdb builds built against gluster/gfapi at this location. Navigate to the OS and arch you would want ^^
07:14 epoch joined #gluster
07:17 [Enrico] joined #gluster
07:19 anil joined #gluster
07:21 rgustafs joined #gluster
07:25 nishanth joined #gluster
07:28 ctria joined #gluster
07:32 David_Varghese joined #gluster
07:32 the-me joined #gluster
07:33 onorua joined #gluster
07:34 zhangjn joined #gluster
07:35 zhangjn joined #gluster
07:43 leucos joined #gluster
07:44 harish joined #gluster
07:47 JonathanD joined #gluster
07:50 Lee1092 joined #gluster
07:50 LebedevRI joined #gluster
07:51 TheCthulhu1 joined #gluster
07:51 akik joined #gluster
07:57 atalur joined #gluster
08:01 haomaiwa_ joined #gluster
08:02 akik i'm trying to setup gluster in azure. i have name resolution working, endpoints configured (111, 2049, 24007, 49152) and installed version 3.7.4 of glusterfs. does gluster require ping to be available between nodes? on azure icmp is disabled
08:02 akik here is some log http://pastebin.com/raw.php?i=6VfhbAqs
08:02 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
08:03 akik even though gluster peer status says "peer in cluster", the volume creation fails
08:04 jcastill1 joined #gluster
08:07 streppel atik: you have to peer probe hostA from hostB and the other way around
08:07 streppel ah no forget it
08:07 akik streppel: yes i did that, before those commands
08:08 streppel yep i read the log incorrectly
08:08 akik azure is giving me nightmares
08:09 streppel akik: i don't know azure enough, but do you have to open ports there? i'd guess that's the problem
08:09 jcastillo joined #gluster
08:09 akik on the server, the ip address is 10.0.0.4 and when connecting to other hosts there is a permanent nat
08:10 TheCthulhu joined #gluster
08:10 akik streppel: i have created those azure endpoints which "open" the needed ports, and disabled iptables for now
08:11 akik if i start sshd on those ports i can see that connectivity works
08:11 akik i was thinking that the problem could be with this network address translation
08:11 jwd joined #gluster
08:11 streppel http://www.gluster.org/community/documentat​ion/index.php/Basic_Gluster_Troubleshooting all of these ports?
08:12 akik streppel: yes
08:12 streppel does nslookup work correctly between the 2 hosts? (aka "nslookup hostB.domain" on hostA and the other way around)
08:12 akik streppel: yes, i installed dnsmasq to ensure it
08:13 akik streppel: otherwise it wasn't working
08:13 streppel akik: selinux enabled?
08:13 akik streppel: yes
08:13 streppel akik: anything blocking there? or is it set to permissive? or did you set the correct attributes?
08:14 gvs77 joined #gluster
08:15 akik streppel: it's enforcing but i didn't see any avc denied messages. i can try to use setenforce 0
08:16 gvs77 Hi, I'm using glusterfs 3.5.2 in replicate mode.  I run kvm on top of the gluster, but a busy VM is freezing due to flushes taking too long occasionally.  anyone else seeing something similar?
08:16 akik streppel: oh wow, i have missed a avc denied message in there
08:16 gvs77 And secondly, to tune options like performance.cache-size, I just run them on one node?
08:17 akik streppel: http://fpaste.org/270006/29098321/raw/
08:19 streppel akik: that could be a reason, not? :)
08:20 akik yea.. fixing now
08:24 akik streppel: well that didn't help
08:25 streppel akik: did you try to create the volume again? anything new in the log?
08:25 akik still getting the same error from volume create
08:25 streppel hm ok
08:25 TheCthulhu joined #gluster
08:27 akik streppel: http://fpaste.org/270010/42910435/raw/
08:27 akik does glusterfs need icmp? because icmp doesn't work in azure
08:28 streppel akik: http://serverfault.com/questions/53​1425/glusterfs-host-not-connected-w​hen-creating-a-volume-on-vbox-vms
08:28 glusterbot Title: routing - GlusterFS - 'Host not connected' when creating a volume on VBox VMs - Server Fault (at serverfault.com)
08:29 streppel according to this it does, but there is for sure an option to disable this check
08:30 akik what is weird is that gluster peer status shows the hosts as connected
08:30 akik from either size
08:30 akik side
08:33 Romeor wazzup
08:33 akik now that i think about centos selinux, by default it is not logging all the messages. i'll look into that
08:35 akik although it shouldn't be the problem case since i can use setenforce 0 to disable it
08:36 suliba joined #gluster
08:37 * Romeor hates selinux and rh (with clones)
08:37 Romeor just fyi
08:37 Romeor :D
08:38 akik Romeor: you can go far enough with /var/log/audit/audit.log and audit2allow from policycoreutils-python
08:38 Romeor i prefer just to disable it
08:39 csim I prefer to be secure, but your server, not mine :)
08:39 Romeor selinux != secure
08:39 streppel i guess the peer status describes the socket between the two, but still if you can't ping the other peers it sees them as offline because pings have a higher priority
08:40 akik streppel: does it say somewhere that glusterfs needs icmp?
08:40 streppel i disable selinux on my servers. that said, my servers are behind a firewall and if anything goes wrong where it shouldn't the network guys will let me know before anything bad happens
08:40 hchiramm_home joined #gluster
08:41 akik because, well, if it does then i have a problem with running glusterfs on azure
08:41 Romeor secure is when your server has its network cable disconnected. security is not about "if i'll be hacked" its about "when i'll be hacked". so selinux gives more problems than security. you have to take other actions to secure yourself and one of them is hardened monitoring.
08:42 csim Romeor: well, when I will be hacked, the hacker will still be limited by selinux, if that suite you
08:42 csim but hey, as i say, that's your server, not mine
08:42 streppel akik: http://azure.microsoft.com/en-us/docume​ntation/templates/gluster-file-system/
08:42 glusterbot Title: Deploys a N node Gluster File System (at azure.microsoft.com)
08:42 streppel akik: this is a template for an n-node gluster filesystem, so it does seem to work with azure
08:43 Romeor the only hacker could be limited by selinux is schoolboy :)
08:43 Romeor but yes, i just share my opinion, how to make admin's life easier.
08:44 streppel akik: according to this https://social.msdn.microsoft.com/Foru​ms/azure/en-US/0669112c-a6dd-4290-bcde​-9ce7b9d60d80/how-do-i-enable-pinging-​a-vm?forum=WAVirtualMachinesforWindows ping between VMs works but has to be enabled in the firewall, so make sure iptables is disabled
08:44 glusterbot Title: How Do I Enable Pinging A VM (at social.msdn.microsoft.com)
08:44 akik streppel: enabled in which firewall?
08:44 akik streppel: i have iptables disabled now on the two hosts
08:44 streppel the guests
08:44 streppel ok
08:45 akik streppel: i have seen that if you have your own private network segment in azure icmp works
08:45 akik streppel: in my case i have to use "separate" hosts with no common network
08:46 akik so the hosts can be considered external to each other
08:47 akik streppel: i have set option ping-timeout 0 in /etc/glusterfs/glusterd.vol but it didn't help in my problem
08:53 akik the thing about selinux not logging everything is described here https://access.redhat.com/documentation/en-US/R​ed_Hat_Enterprise_Linux/6/html/Security-Enhance​d_Linux/sect-Security-Enhanced_Linux-Fixing_Pro​blems-Possible_Causes_of_Silent_Denials.html
08:53 glusterbot Title: 8.3.2. Possible Causes of Silent Denials (at access.redhat.com)
08:54 Bhaskarakiran joined #gluster
08:55 TheCthulhu joined #gluster
08:55 hagarth joined #gluster
08:56 harish joined #gluster
09:00 akik can my problem be classified as a bug? the service ports are open but icmp is not
09:01 haomaiwa_ joined #gluster
09:01 yosafbridge joined #gluster
09:03 Pupeno joined #gluster
09:03 Pupeno joined #gluster
09:04 TheCthulhu joined #gluster
09:11 akay hi guys, can someone confirm something for me... when i mount a gluster client to the volume using samba, it connects only to the server i specify - however when i use fuse mount the client opens connections to all nodes?
09:12 cyberbootje joined #gluster
09:12 arcolife joined #gluster
09:19 shubhendu joined #gluster
09:24 Bhaskarakiran joined #gluster
09:25 TheCthulhu joined #gluster
09:27 atinm joined #gluster
09:29 akik ok here it says to check that you can ping all nodes http://www.gluster.org/community/documentatio​n/index.php/Gluster_3.1:_What_to_Check_First
09:35 TheCthulhu joined #gluster
09:41 Driskell joined #gluster
09:42 schatzi joined #gluster
09:44 mhulsman1 joined #gluster
09:46 Pintomatic joined #gluster
09:47 mhulsman joined #gluster
09:48 frankS2 joined #gluster
09:48 lezo joined #gluster
09:49 mhulsman1 joined #gluster
09:52 twisted` joined #gluster
09:53 fyxim joined #gluster
09:54 TheCthulhu joined #gluster
09:55 billputer joined #gluster
10:01 TheCthulhu1 joined #gluster
10:01 haomaiwa_ joined #gluster
10:02 samsaffron___ joined #gluster
10:03 scubacuda joined #gluster
10:15 TheCthulhu joined #gluster
10:21 shubhendu joined #gluster
10:23 TheCthulhu joined #gluster
10:31 jcastill1 joined #gluster
10:35 raghu joined #gluster
10:35 raghu joined #gluster
10:36 jcastillo joined #gluster
10:36 atinm joined #gluster
10:42 kotreshhr joined #gluster
10:46 Leildin JoeJulian, I've put more info into my bug report about rebalance not working in 3.6.2
10:47 Leildin after changing files manually therebalance did go through but once finished the files changed back and the clients that were rebooted could not connect anymore
10:47 Leildin I'm thinking of going up to 3.7, would you recommend it or just upgrading to 3.6.5 ?
10:53 rgustafs joined #gluster
10:53 ashiq- joined #gluster
10:58 overclk joined #gluster
10:59 streppel akik: any progress? :)
11:00 Bhaskarakiran joined #gluster
11:01 1JTAAB0QO joined #gluster
11:03 rafi joined #gluster
11:04 rafi1 joined #gluster
11:05 pppp joined #gluster
11:13 haomaiwa_ joined #gluster
11:16 itisravi joined #gluster
11:16 kkeithley1 joined #gluster
11:18 akik streppel: no, i had to start working
11:19 akik streppel: i probably send a message to the the Gluster-users mailing list and see if somebody could help
11:24 mhulsman joined #gluster
11:26 jiffin l
11:27 firemanxbr joined #gluster
11:29 mhulsman1 joined #gluster
11:30 overclk joined #gluster
11:30 TheCthulhu joined #gluster
11:32 mhulsman joined #gluster
11:36 akik streppel: any ideas on the glusterfs and not being able to ping the hosts? i'm in a position that have to run the vm's on different azure user accounts
11:36 akik if i was using one azure account, i could create a logical network where i could put the glusterfs hosts
11:37 julim joined #gluster
11:45 TheCthulhu joined #gluster
11:46 mhulsman1 joined #gluster
11:47 streppel akik: sorry, no idea right now
11:47 jiffin1 joined #gluster
11:51 kokopelli joined #gluster
11:51 kokopelli hi all
11:51 DV__ joined #gluster
11:52 dusmant joined #gluster
11:57 jcastill1 joined #gluster
11:57 hagarth hello
11:57 glusterbot hagarth: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
11:58 gvs77 left #gluster
12:00 TheCthulhu1 joined #gluster
12:01 rafi joined #gluster
12:01 neha joined #gluster
12:06 jiffin1 joined #gluster
12:08 unclemarc joined #gluster
12:14 deepakcs hagarth: didn't expect glusterbot to warn u, we should fix glusterbot :))
12:15 hagarth deepakcs: I was responding to kokopelli :)
12:15 jcastillo joined #gluster
12:15 deepakcs hagarth: i know, just kidding :)
12:24 kokopelli :) I need to help for healing. I've 3 node and all of them have different size. I already run heal for the related volume. How can i list corrupted data. Version: 3.7.2
12:25 hagarth kokopelli: does volume heal info list anything?
12:32 kokopelli hmm, i can try, but i have big data, is it a problem for iops?
12:33 yosafbridge joined #gluster
12:34 hagarth kokopelli: volume heal info should not impact any data operations
12:38 nbalacha joined #gluster
12:39 Slashman joined #gluster
12:44 Mr_Psmith joined #gluster
12:45 kokopelli @hagarth: Ok, it's running now and giving list..
12:46 Bhaskarakiran joined #gluster
12:48 SOLDIERz_ joined #gluster
12:52 hchiramm_home joined #gluster
13:00 haomaiwa_ joined #gluster
13:01 haomaiwa_ joined #gluster
13:01 rwheeler joined #gluster
13:04 atrius joined #gluster
13:06 hagarth kokopelli: the list in volume heal info should be empty at some point in time
13:07 mhulsman joined #gluster
13:11 kokopelli @hagarth: thanks it's still running and the list is too long. Selfheal will fix the list or i have to make manual?
13:12 hagarth kokopelli: self-heal should fix the list
13:13 kokopelli hagarth: very well, i have million+ files, i think it takes very time
13:14 hagarth kokopelli: I would expect that. do you know why there are so many files requiring self-heal?
13:15 theron joined #gluster
13:23 kokopelli yes, the system has been running since 2013, i've 150TB+ data and sometimes some nodes have fail.
13:24 dgandhi joined #gluster
13:38 spcmastertim joined #gluster
13:47 neofob joined #gluster
13:50 Pupeno joined #gluster
13:50 Pupeno joined #gluster
13:53 neha joined #gluster
13:55 mhulsman1 joined #gluster
14:00 mhulsman joined #gluster
14:01 haomaiwang joined #gluster
14:02 mhulsman joined #gluster
14:06 mhulsman1 joined #gluster
14:08 mhulsman joined #gluster
14:13 JoeJulian Leildin: I like 3.7. I haven't heard anything bad about it in a couple of releases.
14:14 JoeJulian Leildin: I still don't think it'll change that bug, but it's worth a try.
14:14 Leildin ah great, if it f**ks up I can always say it was your recommendation !
14:14 JoeJulian lol
14:15 Leildin I don't understand where it's getting its info to keep  generating the files incorrectly
14:15 Leildin do you know how it generates the vol files ?
14:15 JoeJulian /var/lib/glusterd/vols/$volume_name/info
14:16 Leildin ok I'll check that
14:17 yosafbridge joined #gluster
14:17 arcolife joined #gluster
14:18 bennyturns joined #gluster
14:22 mpietersen joined #gluster
14:25 Leildin nothing wrong to me ... could you check out a paste ?
14:26 jwd joined #gluster
14:28 jcastill1 joined #gluster
14:29 jwaibel joined #gluster
14:29 bennyturns joined #gluster
14:30 zhangjn joined #gluster
14:31 zhangjn joined #gluster
14:32 zhangjn joined #gluster
14:33 jcastillo joined #gluster
14:34 T1loc joined #gluster
14:34 T1loc Hello !
14:34 Leildin hi T1loc
14:34 T1loc I have a problem with glusterfs (Splitbrain) but I don't know how to fix this problem : http://susepaste.org/9ee629cb
14:35 glusterbot Title: SUSE Paste (at susepaste.org)
14:35 T1loc I think the video on brick-b is good
14:35 T1loc and the splitbrain is : metadata and file splitbrain, right ?
14:36 JoeJulian Leildin: Sure.
14:37 Leildin http://ur1.ca/nu8ql
14:37 glusterbot Title: #270147 Fedora Project Pastebin (at ur1.ca)
14:37 JoeJulian T1loc: what version of gluster?
14:38 T1loc JoeJulian: : 3.7.3
14:38 zhangjn joined #gluster
14:39 zhangjn joined #gluster
14:40 yangfeng joined #gluster
14:41 zhangjn joined #gluster
14:42 zhangjn joined #gluster
14:43 JoeJulian T1loc: There are instructions someplace here: http://gluster.readthedocs.org/en/latest/ . I have to walk my daughter to school so I don't have the time it would take to find it right now.
14:43 glusterbot Title: Gluster Docs (at gluster.readthedocs.org)
14:43 zhangjn joined #gluster
14:44 zhangjn joined #gluster
14:45 GB21 joined #gluster
14:45 GB21_ joined #gluster
14:48 arcolife joined #gluster
14:52 DV joined #gluster
14:57 T1loc I only find documentation for split brain with a : Type : Distributed-Replicate but, My bricks are on : Type: Replicate
14:59 kkeithley_ T1loc: same thing. Split-brain applies to any AFR (Replicate) volume.
15:00 T1loc kkeithley_:  But when we do the : getfattr we have only the client-0 on bricks A and client-1 on brick B, it should not have both ?
15:00 amye joined #gluster
15:01 haomaiwang joined #gluster
15:02 kkeithley_ T1loc: I don't understand. What is your question?
15:06 T1loc It is normal that I don't have the two : trusted.afr.videos-client-{0,1} on each server ?
15:07 JoeJulian That's the old way. There's a new easier way.
15:07 JoeJulian Damn this documentation is still hard to navigate.
15:09 T1loc JoeJulian:  If only I was not looking the documentation...
15:11 JoeJulian I'm trying to find it. 3.7 added features that make it easier.
15:12 dusmant joined #gluster
15:12 JoeJulian T1loc: https://gluster.readthedocs.org/en/release-3.7.0​/Features/heal-info-and-split-brain-resolution/
15:12 glusterbot Title: heal info and split brain resolution - Gluster Docs (at gluster.readthedocs.org)
15:14 EinstCrazy joined #gluster
15:15 virusuy Hi guys, one node of my distributed replicated volume (2x2) was off for a few days, and now when i try to reconnect to the network it says " Cksums of volume VOL differ."
15:16 _Bryan_ joined #gluster
15:17 T1loc JoeJulian: thank you for the latest link.
15:18 JoeJulian virusuy: that means the volume definition was changed while it was offline. rsync it from a good server (/var/lib/glusterd/vols/$volume_name)
15:18 kkeithley_ @split-brain
15:18 glusterbot kkeithley_: To heal split-brains, see https://github.com/gluster/gluster​fs/blob/master/doc/features/heal-i​nfo-and-split-brain-resolution.md . Also see splitmount https://joejulian.name/blog/gluster​fs-split-brain-recovery-made-easy/ . For additonal information, see this older article https://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/
15:18 JoeJulian yeah, probably should be updated.
15:18 virusuy JoeJulian:  oh, cool, thanks !
15:19 JoeJulian Especially since I had to google with specific words I could remember from having read the doc I referenced in order to find it.
15:19 kkeithley_ @forget split-brain
15:19 glusterbot kkeithley_: The operation succeeded.
15:19 wushudoin joined #gluster
15:20 kkeithley_ @learn split-brain as To heal split-brains, see https://gluster.readthedocs.org/en/release-3.7.0​/Features/heal-info-and-split-brain-resolution/  For additional information, see this older article https://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/  Also see splitmount https://joejulian.name/blog/gluster​fs-split-brain-recovery-made-easy/
15:20 glusterbot kkeithley_: The operation succeeded.
15:20 kkeithley_ @split-brain
15:20 glusterbot kkeithley_: To heal split-brains, see https://gluster.readthedocs.org/en/release-3.7.0​/Features/heal-info-and-split-brain-resolution/ For additional information, see this older article https://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ Also see splitmount https://joejulian.name/blog/gluster​fs-split-brain-recovery-made-easy/
15:22 virusuy JoeJulian:  That worked like a charm, thanks gent
15:24 JoeJulian You're welcome.
15:25 akay does anyone have experience with disperse-distributed volumes? how does it compare to hardware raid in terms of performance and reliability?
15:26 JoeJulian I haven't seen anything about that yet. If you test it, please blog about it so I can point people there.
15:27 JoeJulian As for reliability, it's just math. It would be the same or better by a few fractions of a percent depending on how you build it.
15:30 akay sure thing JoeJulian. what about between raid6 replica 2 and one brick per disk replica 3? any personal preference?
15:31 JoeJulian Depends on SLA/OLA requirements, performance, funding.
15:32 JoeJulian With sufficient funding I'd probably favor 4 disk raid 0 per brick, replica 3.
15:33 akay the main concern for this would be performance and reliability... although raid6 allows more disks to fail id say the odds of the 3 exact disks failing in a per brick scenario would be very rare
15:33 JoeJulian Maybe throw in some SSD journals.
15:33 TheCthulhu joined #gluster
15:34 JoeJulian http://wintelguy.com/raidmttdl.pl
15:34 glusterbot Title: RAID Reliability Calculator - WintelGuy.com (at wintelguy.com)
15:34 akay rebuild times on raid0 wouldnt be a concern?
15:34 vanshyr joined #gluster
15:34 JoeJulian Not really. You've still got 2 good replica during the heal.
15:35 JoeJulian Should be a good solid 6 nines.
15:35 hagarth joined #gluster
15:35 akay nice calculator!
15:36 akay our current setup has 12 disk raid6 as recommended by redhat, but not sure if there are better ways to do it
15:37 akay you would prefer the raid0 rep 3 instead of just one brick per disk?
15:37 JoeJulian Different ... better... all a matter of perspective.
15:37 akay fair enough
15:37 JoeJulian The raid0 gives you the spindle performance, assuming you have a network that can keep up.
15:37 JoeJulian rep 3 for the reliability.
15:38 akay does gluster disperse still have the same write penalty?
15:38 vanshyr Hi! I am doing some test with gluster 3.6.5 version and i have a couple of questions, how do you recover the volumes after they are full? I mean in the client i can see the file system and i execute the delete command on the files and i see them dissapearing from the client list but the space still in use and if i go to the servers i can see one of them with the disk 100%, so what is the correct rpocedure to fix it?
15:38 pdrakeweb joined #gluster
15:38 JoeJulian akay: not sure. I haven't had the time to analyze that like I did with ,,(stripe).
15:38 glusterbot akay: Please see http://joejulian.name/blog/sho​uld-i-use-stripe-on-glusterfs/ about stripe volumes.
15:38 luis_silva joined #gluster
15:39 JoeJulian vanshyr: I would have expected that to work. Check your brick logs for errors. If there's a problem with that, that's where I would expect to find it.
15:40 JoeJulian Unless, of course, those deleted files are still open, but I assume you already know and checked that.
15:40 JoeJulian bbiab... gotta make me some coffee.
15:40 ashka hi, on a type: distribute volume, can I prioritize where files go when I do a remove-brick operation ?
15:41 JoeJulian ashka: no, but that sounds like it would be a good feature request. file a bug
15:41 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
15:41 virusuy JoeJulian:  Sir, auto-heal started but seems to be removing files rather to add , it's that normal ?
15:41 virusuy JoeJulian:  i mean, i see "available" space increasing instead of decreasing
15:42 akik is it absolutely necessary to have ping working between glusterfs nodes? i'm trying to build glusterfs on azure and this seems to be the stopping block
15:42 akay thanks for that JoeJulian, illl do some testing
15:44 vanshyr JoeJulian yes i have been checking the logs and there is an error but i dont know what it produced it, i even tried to reboot the boxes to be sure there is no process around accessing, in the logs i can see like some servers still thinking those files are there and they can not access them: W [ec-combine.c:801:ec_combine_check] 0-tank-disperse-1: Mismatching xdata in answers of 'LOOKUP'
15:45 luis_silva Hey guys quick questions. We are running glusterfs-3.3.1-1 and we a have 3 bricks that are 100%, 76% and 50% full. I tried rebalance but it does not seem to be working. I also tried passing fix-layout. Any advice on what to try next?
15:46 vanshyr I produced on purpose the error for the test uploading 207GB file in a gluster volume of 300 GB but each server only have like 70GB free for real so i was expecting it to disperse that file among all of them but what i found is that only 3 are used , those 3 are forming 1 volume
15:46 theron_ joined #gluster
15:47 vanshyr If i trie to start the copying again maybe gluster starts to use the second volume and then the same scenario, only 3 servers are writting the data instead to distribute it among the 6, is that the normal behaviour?
15:53 CU-Paul joined #gluster
15:54 JoeJulian virusuy: It would remove the "bad" version, then self-heal the "good" one back over, so it seems like that could explain what you're seeing.
15:54 JoeJulian vanshyr: " W " warning. Any " E "?
15:54 CU-Paul Hi all, I just added a new brick to my volume and am seeing the following errors in var/log/glusterfs/brick/brick.log: [2015-09-22 15:46:01.420888] I [server-rpc-fops.c:475:server_mkdir_cbk] 0-gv0-server: 1498: MKDIR /childcare (00000000-0000-0000-0000-000000000001/childcare) ==> (Permission denied)
15:55 CU-Paul Can someone point me in the right direction on this?  My googles have come up short.
15:55 virusuy JoeJulian:  yeah, but the whole volume ? or just those files that are not in the "bad" node ?
15:55 JoeJulian luis_silva: I never successfully rebalanced with 3.3. Can you upgrade?
15:55 JoeJulian virusuy: the latter.
15:55 virusuy JoeJulian:  ok, seems reasonable , thanks for your help sir !
15:56 luis_silva I suppose we should upgrade then.
15:57 luis_silva Thanks Joe.
15:57 JoeJulian vanshyr: distribute balances disk usage by hashing the filenames and spreading the files out among its subvolumes based on hash masks. Sounds like you're looking for disburse or stripe.
15:57 JoeJulian vanshyr: if the same file is being written to three servers, are you using replica 3?
16:01 JoeJulian CU-Paul: that's an informational message, " I " so I wouldn't be super worried about it unless something isn't working.
16:01 haomaiwang joined #gluster
16:01 shubhendu joined #gluster
16:02 JoeJulian It's a busy morning for a Tuesday. :D
16:02 CU-Paul JoeJulian: I am not yet seeing any replication in the new volume after mounting it on a client.
16:03 JoeJulian ~pasteinfo | CU-Paul
16:03 glusterbot CU-Paul: Please paste the output of "gluster volume info" to http://fpaste.org or http://dpaste.org then paste the link that's generated here.
16:03 jobewan joined #gluster
16:04 zhangjn joined #gluster
16:04 CU-Paul JoeJulian: http://fpaste.org/270177/14429378/
16:04 glusterbot Title: #270177 Fedora Project Pastebin (at fpaste.org)
16:05 JoeJulian Cool. That rules out the obvious.
16:05 JoeJulian have you checked, "gluster volume heal $vol info"?
16:06 CU-Paul Yes, 405 entries on the first node, nothing on this new node.
16:06 JoeJulian Right, that would make sense. The heals would go from the existing replicas to the new one.
16:07 JoeJulian Any errors in /var/log/glusterfs/glustershd.log on the first server?
16:08 vanshyr JoeJulian The only E i see in the client side logs is when i rebooted the machines informing about handsake fail because could not connect. If i search for the file at the mount point in the servers i dont see it but if i go to .glusterfs directory i can see that the size inside is the one equivalent to the file deleted. I am evaluating gluster as a solution to store files of different sizes but mostly GB/TB so the idea of distribute-disper
16:08 vanshyr se was to be able to read the big files from diferent servers at same time and provide redundancy in case of node failure. When i created the test i created 3 bricks with one redundancy and then later i added another 3 so now it marks them as  Number of Bricks: 2 x (2 + 1) = 6
16:08 JoeJulian btw... I'm guessing at which server would be running the heal based on the notion that it'll try to heal from the one with the indication.
16:08 CU-Paul Nope, only connection errors to a peer that is purposely disconnected right now.
16:10 JoeJulian vanshyr: So you've got a 2 2+1 disburse subvolumes to distribute. So half your files would be created on one set of three, and the other half would be on the other set of three (statistically).
16:12 JoeJulian CU-Paul: try "gluster volume heal $vol full"
16:12 CU-Paul JoeJulian: From any of the bricks?
16:13 JoeJulian right
16:13 vanshyr so then for my scenario would it be better to create just 1 volume with the 6 node having like 5+1? (and that means that every time i would like to expand it i will have to put 6 new boxes at a time right?)
16:13 Leildin JoeJulian, thanks for decrypting on my bug report. they need to tell me not suggest I am basically a headless chicken at this point. This bug is beyond me
16:13 JoeJulian vanshyr: If that suits your use case, yes. I try not to make those decisions.
16:14 CU-Paul JoeJulian: Usual successful launch message
16:14 _joel joined #gluster
16:14 JoeJulian CU-Paul: Ah, right. You have a glusterd offline.
16:14 JoeJulian CU-Paul: If you brought that glusterd online, you could still kill the glusterfsd (brick) process if you require that brick to be out.
16:15 CU-Paul Gotcha.
16:15 akay when i mount a gluster client to the volume using samba i seem to get better performance, but it connects only to the server i specify - however when i use fuse mount the client opens connections to all nodes... would there be much of a performance difference with concurrent file transfers?
16:16 xMopxShell joined #gluster
16:17 CU-Paul JoeJulian: Restarted glusterfs-server, killed glusterfsd pids, heal full gave same successful launch
16:17 vanshyr JoeJulian i understand, I am thinking to setup a new infrastructure for a Bioinformatics project so they need big amounts of space, speed, redundancy redundancy and ACLv4, right now i have it deployed with FreeBSD+ZFS+iSCSI, 6 boxes exporting to 1 a 300TB ZFSPool. I was reading about GlusterFS + Ganesha (for the ACLv4) and i want to give it a try but i think disperse/distribute nomenclatures confuse me :)
16:18 JoeJulian akay: The only performance benefit you're seeing from samba is the smb client caching, though if it's using the libgfapi vfs you could be avoiding some context switching which would be faster. Generally with single or multiple file ops, I can max out the network with fuse.
16:19 JoeJulian @lucky dht misses are expensive
16:19 glusterbot JoeJulian: https://joejulian.name/blog​/dht-misses-are-expensive/
16:19 JoeJulian vanshyr: Check out ^ for a detailed explanation of distribute.
16:21 calavera joined #gluster
16:21 vanshyr thx joejulian
16:21 JoeJulian CU-Paul: Going to have to check logs to find out why that's unsuccessful. It's probably in one of the glusterd logs (/var/log/glusterfs/etc-gl​usterfs-glusterd.vol.log)
16:23 JoeJulian This is the spot where I usually plug logstash. It's nice to have all your logs in one place.
16:23 JoeJulian s/nice/invaluable/
16:23 glusterbot What JoeJulian meant to say was: This is the spot where I usually plug logstash. It's invaluable to have all your logs in one place.
16:24 cholcombe joined #gluster
16:24 akay browsing folders is much faster with samba (yes using libgfapi vfs) - where should i look for fixing context switching issues?
16:25 JoeJulian @lucky kernel process context switching
16:25 glusterbot JoeJulian: https://en.wikipedia.org/wiki/Context_switch
16:25 CU-Paul JoeJulian: Absolutely re: logstash.  vol status shows one of the bricks offline, is that expected with glusterd running and glusterfsd stopped?
16:26 JoeJulian akay: it's not something you can get away from unless you write your software to use libgfapi directly.
16:26 JoeJulian CU-Paul: yes
16:27 JoeJulian CU-Paul: I'm wondering if you should just "remove-brick ... commit force" that brick that you're keeping offline, for now.
16:27 JoeJulian Maybe it's stuck in a healing queue.
16:27 JoeJulian I haven't looked at how they schedule that recently.
16:28 CU-Paul JoeJulian: That is an option.  After this datastore outage we have been running on the single brick.  And since I haven't been able to get the two bricks synced back up, I'm more than willing to remove the two additional bricks and bring them back in.
16:28 akay ahh. i guess its most noticable when i i connect to volume with samba directly as opposed to regular samba point to a fuse mount
16:38 Leildin akay, how did you connect with samba directly ?
16:42 Rapture joined #gluster
16:51 JoeJulian Leildin: there's a vfs layer in samba that uses libgfapi.
16:53 mhulsman joined #gluster
16:54 mhulsman1 joined #gluster
17:00 Leildin woulda direct samba be better than samba point to fuse point for smaller files ?
17:01 JoeJulian yep
17:01 Leildin I'll look into it thanks
17:01 haomaiwa_ joined #gluster
17:01 Leildin did you see my paste earlier on ?
17:01 bennyturns joined #gluster
17:02 JoeJulian maybe...
17:02 JoeJulian It's been a long time. Refresh me.
17:02 Leildin http://ur1.ca/nu8ql
17:02 glusterbot Title: #270147 Fedora Project Pastebin (at ur1.ca)
17:03 Leildin the info file in glusterd/vols/
17:03 JoeJulian Oh, right. No, nothing wrong with that.
17:04 Leildin thought so ... then Ido not know why it regenerated bad vol files after data-rebalance
17:06 JoeJulian Because it changed the info as it had finished which triggered a new vol file generation.
17:06 Leildin why did it fill it with crap though ... everything's working fine and info file's good ! :(
17:07 kanagaraj joined #gluster
17:07 Leildin I changed my bug classification too thanks for expliciting his request
17:09 skoduri joined #gluster
17:09 togdon joined #gluster
17:10 cyberbootje joined #gluster
17:14 cholcombe joined #gluster
17:17 skoduri joined #gluster
17:28 pdrakeweb joined #gluster
17:58 theron joined #gluster
18:02 calavera joined #gluster
18:21 shaunm joined #gluster
18:24 calavera joined #gluster
18:31 skylar joined #gluster
18:33 skylar Hi folks, I have a question about setting up an arbiter brick with Gluster 3.7.4 (on RHEL6 if that matters)
18:33 skylar I'm running into a problem where gluster clients are seeing empty files, almost as if they're treating the arbiter brick as holding data when it really just holds metadata
18:34 skylar if I mount gluster on the storage nodes, everything looks fine though
18:34 skylar but the clients see empty files even if they created the files themselves
18:34 skylar the command I used to create the volume looked like this: gluster volume create some-vol replica 3 arbiter 1 transport tcp node1:/path/to/brick node2:/path/to/brick node3:/path/to/brick
18:34 skoduri joined #gluster
18:36 jwd joined #gluster
18:42 JoeJulian skylar: Are you sure all your servers and clients are >= version 3.7.0
18:43 skylar joejulian - yeah, I checked all my server/client version levels
18:43 skylar all are 3.7.4
18:43 jwaibel joined #gluster
18:43 pdrakeweb joined #gluster
18:45 _joel hi all, i inherited a gluster setup and need to take a node offline for maintenance, but im unsure how to stop the glusterfs/d processes without taking the whole cluster down.  running "service glusterfs-server stop" doesnt stop the replication between the nodes.  how do i put a node in maintenance mode?  i'm on 3.4.4
18:47 JoeJulian Personally, if I want to kill all the gluster services on one server: I check 'gluster volume heal $vol info'' to make sure it's healthy, then 'pkill -f gluster' which would kill all clients, servers, and management - which works for the way I have things configured.
18:48 vxitch joined #gluster
18:48 vxitch Hello! I am having an impossible time trying to heal my 2 brick replicated volume.
18:49 _joel @JoeJulian: then to start it back up just run service gluster-server start?
18:49 vxitch "$ gluster volume heal engine info" keeps showing the same entries over and over. nothing is changing and the two bricks do not match.
18:50 JoeJulian _joel: yes.
18:52 vxitch Does anyone know how the heal mechanism works and why it isn't doing anything for me?
18:53 vxitch I really need help with this. Gluster has been holding back my deployment for a week.
18:53 JoeJulian vxitch: what version?
18:53 vxitch 3.7.4 on RHEL 7.1
18:54 JoeJulian Hmm, shouldn't be a false positive then.
18:55 JoeJulian How about "gluster volume heal engine info heal-failed"
18:55 vxitch brick2 got corrupted (it's on another host on zfs. gluster started before zfs mounted. cue hilarily) and I put everything back as it should be but only brick1 had data. brick2 was empty despite having the right attr on the brick dir.
18:56 vxitch JoeJulian: Command not supported. Please use "gluster volume heal engine info" and logs to find the heal information.
18:56 JoeJulian I kind-of thought so, but was curious since it's still in the help.
18:57 vxitch Yeah I ran into the same thing. Apparently it was all bunched into heal <vol> info, including split-brain, but that command remains.
18:58 JoeJulian vxitch: which brick are the heal entries listed under:
18:58 JoeJulian ?
18:58 vxitch brick1
18:58 vxitch which is the known-good brick
18:59 JoeJulian I would check /var/log/glusterfs/glustershd.log on both servers for clues.
18:59 JoeJulian left #gluster
19:00 JoeJulian joined #gluster
19:00 JoeJulian oops
19:01 JoeJulian skylar: I can't find anything. Check logs.
19:01 skylar thanks, running through them now
19:01 skylar have glusterfs running w/ --log-level=DEBUG
19:01 vxitch JoeJulian: brick1's log is fine. it shows engine connecting and being ok. brick2's log is unhappy: http://hastebin.com/oneyigiwom.coffee
19:01 glusterbot Title: hastebin (at hastebin.com)
19:02 vxitch I don't know how to interpret that log
19:05 skylar don't see anything obviously wrong
19:06 skylar I might just start from scratch on the storage nodes though
19:06 skylar delete /var/lib/glusterd and clear out the bricks
19:06 skylar wonder if I have some cruft from prior tests
19:07 JoeJulian That could make some sense.
19:07 mhulsman joined #gluster
19:09 JoeJulian vxitch: looks like it might be failing the direct_io test. Check the brick log.
19:10 vxitch JoeJulian: it looks to accept client then immediately disconnect
19:10 vxitch repeatedly
19:10 JoeJulian Well that's not very useful.
19:11 vxitch yup :( exactly how I feel about all the feedback gluster gives
19:11 vxitch the only other thing echoes the file not found in the other log: http://hastebin.com/ecicawuhuk.coffee
19:11 glusterbot Title: hastebin (at hastebin.com)
19:12 vxitch this is supposed to be the datastore for a hypervisor cluster
19:12 Slashman joined #gluster
19:12 vxitch but it doesn't seem to be very stable. it is extremely frustrating.
19:12 vxitch is this typical? all I want is replication across 2 hosts.
19:13 skylar sadly deleting the underlying brick contents plus /var/lib/glusterd did not help
19:15 skylar what's the best doc for just using a tiebreaker system for quorum? the docs I found seem to be version-dependent and I haven't found one for v3.7 yet
19:19 cyberbootje joined #gluster
19:19 skylar as best as I can tell I want cluster.server-quorum-type=server and cluster.server-quorum-ratio=51
19:19 skylar and I can leave what was going to be arbiter node as a peer in the pool, but have no bricks on it
19:20 Philambdo joined #gluster
19:20 Pupeno joined #gluster
19:20 Pupeno joined #gluster
19:21 bennyturns joined #gluster
19:35 vxitch anyone here use Apache Helix?
19:38 vxitch or any other replicated FS?
19:43 Pupeno joined #gluster
19:50 vxitch no help in the useless and outadated docs, no help on here, and on top of it all the gluster project doesnt take stability seriously. thanks for nothing. this project is doomed. and I have been singing the praises for gluster since 3.3. no more.
19:51 vxitch can't even do replication. shame.
20:07 rotbeard joined #gluster
20:09 JoeJulian vxitch: You sound frustrated.
20:10 Pupeno joined #gluster
20:10 JoeJulian You also sound like you'd like to take out your frustration on people that are not deserving of it. Please consider that.
20:10 ro_ joined #gluster
20:10 vxitch JoeJulian: Heh, yes :) I am incredibly frustrated. Which is out of character for me. It's just that I have enjoyed Gluster for a long time, despite its shortcomings. But lately it has been blocking a critical project for over a week, and I have been finding no solution.
20:11 vxitch I was hoping for some help here. I was on here a couple days ago, to utter silence. And then again today, got no help deciphering the mystical logs gluster outputs.
20:11 JoeJulian I know how that goes.
20:11 vxitch It seems to me like this project is doomed. And that sucks, since without a pricey central storage solution, I am out of luck.
20:12 vxitch As a non-profit, that's not an option.
20:12 vxitch ah well.
20:12 JoeJulian btw... I work for IO Datacenter, not Red Hat. I have my own job there to do and help out here for fun.
20:12 JoeJulian And doom has been called for the last 6 years that I've been around.
20:12 vxitch I'm hoping for more than just you to be active on here. Yet..
20:13 JoeJulian Since then the developer team has quadrupled and the number of users in this channel has climbed 400%.
20:13 JoeJulian And yes, I'd like that too. :)
20:13 JoeJulian It happens, but it runs hot and cold.
20:13 vxitch And yet it's still unreliable software, the devs apparently need a couple good managers because the documentation is awful and the user experience just as much, and this channel might as well have 3 users on it.
20:13 JoeJulian People get busy.
20:14 JoeJulian Did you see the docs 6 years ago?
20:14 JoeJulian It's way better.
20:14 vxitch No, I did see them when the website switch happened
20:14 vxitch and it is better
20:14 vxitch but better unfortunately hasn't yet reached usable
20:14 JoeJulian And it's been reliable for me and a lot of other people, so maybe it's something else.
20:15 vxitch I wouldn't be so appalled if the project wasn't claiming v3
20:15 JoeJulian There's some huge companies that use it regularly.
20:15 vxitch Maybe it is. But after getting a huge multinational company on it for a few months, they dumped it too.
20:15 vxitch With a *very* bright IT team behind it all, on top of that
20:15 vxitch because it's pretty awful for v3 software
20:16 vxitch and I felt like a dummy for singing its praises when they dumped it
20:16 JoeJulian not really
20:16 vxitch but I thought I was wrong, maybe it just didnt fit their use case
20:16 vxitch and here I am. trying to get a replicated volume up. and having it blow up with 0 indication why
20:16 vxitch so now I'm pretty sure it isn't just them, or just me
20:17 JoeJulian Ok, well tell me that I'm wasting my effort helping you then and I'll get back to work.
20:17 vxitch we're just shooting the shit at this point, to be fair
20:17 vxitch which, I do appreciate
20:17 JoeJulian Fair enough.
20:18 vxitch anyway. there's someone at my university who used to run a decently sized gluster cluster with ovirt on it. i'll go see if he's still on it and maybe he has some ideas
20:18 VeggieMeat i've not had any real issues with gluster.... and i'm a one-man hosting provider running an 8-node 100TB cluster who only needed 2 days to get up to speed.... the only issue i had was early on trying to run with just two nodes - that gets into split brain very quickly
20:19 vxitch VeggieMeat: unfortunately I only have 2 nodes. they're chunky enough so 2 are sufficient. Yes the split brain resolution is what is awful here.
20:20 JoeJulian pfft
20:20 vxitch the nodes are guaranteed to go into split brain with just 2 of them around. yet they cant fix themselves, and the docs don't help in resolving it manually
20:20 JoeJulian With 3.7 that's complete and utter bullshit, pardon my language.
20:20 JoeJulian @split-brain
20:20 glusterbot JoeJulian: To heal split-brains, see https://gluster.readthedocs.org/en/release-3.7.0​/Features/heal-info-and-split-brain-resolution/ For additional information, see this older article https://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ Also see splitmount https://joejulian.name/blog/gluster​fs-split-brain-recovery-made-easy/
20:20 vxitch and here I am, on 3.7.4
20:20 JoeJulian See the first link.
20:20 vxitch I have that open already
20:21 JoeJulian Plain, simple, policy based.
20:21 JoeJulian Just what people have been asking for for years.
20:21 vxitch Let me check the heal status again
20:21 JoeJulian Your issue doesn't seem to be from split-brain, does it.
20:22 JoeJulian Yours seems to be something completely different.
20:22 dgbaley joined #gluster
20:22 vxitch At first I thought it was
20:22 vxitch heal info split-brain doesn't list any split brained nodes
20:22 JoeJulian You claim that the filesystem wasn't mounted, but if it hadn't been, the brick just wouldn't have started.
20:22 vxitch the mount point existed despite the fs not being mounted
20:22 JoeJulian Right, but the volume-id wouldn't have been there.
20:22 vxitch right
20:23 JoeJulian So unless someone created it by hand, the brick wouldn't have started.
20:23 vxitch but somehow it started, then the fs mounted on top, and something funky and mysterious happened
20:23 vxitch all I did was reboot the host
20:24 vxitch and zfs failed to import automatically because it was still referring to disks via sdx instead of by-ID. I wasn't aware of it and fixed that then imported
20:24 JoeJulian So, the equivalent of writing partial inode tables to a running filesystem happened and you expect any filesystem to recover automatically?
20:24 vxitch since it's a file based fs, and the mount was recreated with no data on it, yes
20:24 vxitch i expect gluster to see no files exist, but that it's a valid brick (as it reports)
20:24 vxitch and to copy the data from brick1 to brick2 and get me on my way
20:25 vxitch instead, nothing happens and the logs don't say what or why
20:25 togdon joined #gluster
20:25 vxitch can i nuke it all on brick2 while leaving brick1 intact and start over?
20:25 JoeJulian The sequence of events and the state that you're in suggest that there is information missing. The brick can't start withouth the volume-id xattr being set. That's by design to prevent what you're describing.
20:26 JoeJulian Yes.
20:26 JoeJulian Just kill the brick, wipe it, create the volume-id xattr, start the brick again, heal ... full.
20:26 vxitch i've done that twice now. but i'll try it again
20:27 JoeJulian I'm betting the brick service was up.
20:27 vxitch volume stop, rm -rf, mkdir, setfattr, volume start, volume heal ?
20:27 vxitch stop glusted altogether then?
20:27 togdon joined #gluster
20:27 JoeJulian glusterd = admin service. glusterfsd = brick service.
20:27 JoeJulian And yes, volume stop should have done that.
20:27 vxitch right, thanks
20:28 vxitch i always run a volume status to make sure it's down
20:28 vxitch well. no harm in trying again
20:28 JoeJulian ok
20:28 vxitch thanks
20:29 JoeJulian While you're doing that, I'm going to create a new replica 2 volume and duplicate that process and make sure it hasn't changed.
20:29 vxitch mk
20:29 skylar JoeJulian - on the split brain topic, should having a tiebreaker/quorum node (no storage bricks, just a peer in the pool) prevent split brain entirely?
20:32 JoeJulian I never say never.
20:32 JoeJulian But it should.
20:32 vxitch did it, heal info (after heal full) reports the same exact thing as before. 3 on brick1, 0 on brick2 (which is empty save for .glusterfs dir)
20:32 vxitch volumes are ok according to volume status
20:32 atalur joined #gluster
20:32 vxitch ensured the glusterfsd service was down before starting, volume stop did stop it indeed
20:33 vxitch skylar: how would I go about setting that up? in theory
20:33 vxitch peer probe but no brick on a node?
20:34 JoeJulian heal full?
20:34 skylar vxitch - yep, that's how I'm approaching the problem
20:34 vxitch JoeJulian: yup
20:35 poornimag joined #gluster
20:35 vxitch skylar: how would you set the quorum options on the volume? any different from a typical 3 node distributed setup with quorum?
20:36 skylar I think you'll want cluster.server-quorum-type=server and cluster.server-quorum-ratio=51 for a 2-replica setup and one quorum node
20:36 vxitch ah, ok cool.
20:36 vxitch if I end up redoing this cluster I'll give that a shot
20:36 ro_ I'm getting a request timeout when I do a 'gluster volume create testing replica 2 transport tcp' with 8 nodes. Is there some kind of limit to what that command can handle at once?
20:37 skoduri joined #gluster
20:38 JoeJulian I just tried the exact same process and it worked, so let's try one more thing. Stop the volume again. Since we're going to heal..full, this won't matter. tar it up if it makes you feel better though. stop the volume, rm $brick1_root/.glusterfs/indices/xattrop $brick1_root/.glusterfs/*.db* , start the volume and heal..full again.
20:39 vxitch okay let me try that
20:39 skoduri kkeithley_, ping
20:39 glusterbot skoduri: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
20:42 JoeJulian So we've got the president of China, Xi, visiting Seattle and messing up traffic today.
20:42 JoeJulian Is the next Chinese president going to be Xii? :D
20:42 vxitch JoeJulian: removed brick2, recreated + setfatter, rm'd as you suggested on brick1. same old thing
20:43 vxitch no change from last time. 3 need healing on brick1, 0 on brick2, still no data on brick2
20:43 JoeJulian Wait.. that's not possible.
20:43 skylar JoeJulian - you're in Seattle too? I'm lucky in that I'm at UW and live in Fremont, but my wife gets to come home from Redmond
20:43 JoeJulian We just wiped that data.
20:43 skylar hopefully she makes it before tomorrow
20:43 JoeJulian I actually work from home in Edmonds.
20:43 vxitch JoeJulian: whoops hold on
20:44 vxitch did it on the wrong brick
20:44 JoeJulian hehe, ok.
20:44 vxitch (s'ok that one's also screwed.)
20:45 JoeJulian Oh! Yeah, skylar, I see you're in sasag. :D Say hi to everybody for me.
20:45 skylar hah, yes, I'm that skylar :)
20:45 skylar I must say that fixing split brain in gluster is so much easier than GPFS. Had that happen once, that was a real headache
20:45 vxitch okay, heal info shows 0
20:46 vxitch phew
20:46 JoeJulian Excellent. And the next question, is it healing?
20:46 vxitch how do I figure that out?
20:46 vxitch no data on brick2 yet
20:46 JoeJulian I generally just watch for data.
20:46 vxitch nothing yet
20:47 JoeJulian You did a heal...full after that, right?
20:47 vxitch lots of small files, would expect osmething by now
20:47 vxitch yes
20:47 JoeJulian ok
20:47 JoeJulian I wonder if we should kill all the glustershd and restart all the glusterd.
20:47 JoeJulian Just to be thorough.
20:48 vxitch yeah, sure
20:48 JoeJulian I do with there was a better window in to what shd was doing.
20:48 JoeJulian To which I tell myself to file a bug
20:48 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
20:50 theron joined #gluster
20:52 vxitch JoeJulian: destoryed+recreated brick2, removed .glusterfs/ artifacts from brick1, stopped+started glusterd, start volume, heal full, heal info = 0
20:52 vxitch still no data
20:52 vxitch pretty sure it's a lost cause :)
20:52 JoeJulian Anything in the shd logs?
20:53 skylar I just did another split-brain (yank the network cord) test, and I'm still getting split-brain files even with cluster.server-quorum-type=server
20:54 vxitch nothing. says it's online, connects, attaches, starts full sweep, finishes full sweep, EOL
20:54 vxitch skylar: bummer
20:54 JoeJulian On both servers?
20:54 skylar yeah
20:54 vxitch JoeJulian: full sweep only shows on brick2, brick1 just connects and attaches and sits
20:55 JoeJulian Which one's empty?
20:55 vxitch brick2
20:55 vxitch has the empty brick
20:55 JoeJulian check the load on 1, maybe it's building a directory tree or something? Could that be possible?
20:55 kkeithley_ skoduri: pong, what's up?
20:55 skylar the issue isn't a big deal for us because the application deals with it cleanly, as long as we don't forget to heal split-brain first
20:56 vxitch nothing eating CPU on either host
20:57 vxitch i'm thinking of nuking from orbit, reinstalling OS, reconfiging zfs and gluster, and hoping for the best
20:58 vxitch it'd suck, but with cobbler and salt it's not terrible. still half a day though, including oVirt install
21:00 rotbeard joined #gluster
21:05 JoeJulian yuck.
21:05 JoeJulian What if you mount a fuse client and do a "find".
21:05 vxitch I did that initially. If I mount the brick1 host, find works
21:06 vxitch if i mount the brick2 host, find doesnt find anything
21:06 vxitch and nothing is triggered on the volume by finding on either host mount
21:06 poornimag joined #gluster
21:08 vxitch yeah, just tried it again. mounted both hosts one by one, did a find. checked heal info and the brick dir after, no change on brick2
21:09 theron joined #gluster
21:16 JoeJulian Is that still true? mount it from one server it works, the other doesn't?
21:17 vxitch the mounts always go through, but mounting from the brick1 server will have my files as expected
21:17 vxitch mounting from the brick2 server does not
21:17 JoeJulian Whoah, that's got to tell us something.
21:17 JoeJulian Because ,,(mount server)
21:17 glusterbot (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns, or (#2) One caveat is that the clients never learn of any other management peers. If the client cannot communicate with the mount server, that client will not learn of any volume changes.
21:17 vxitch yeah, that's what I knew as well
21:18 vxitch and yet..
21:18 vxitch connectivity exists betwween the two hosts on an ad-hoc network (directly connected)
21:18 vxitch iptables is disabled on both hosts for this testing
21:21 JoeJulian vxitch: let me see /var/lib/glusterd/vols/engine/engine.tcp-fuse.vol
21:22 JoeJulian Theoretically from either server should be the same.
21:27 vxitch JoeJulian: here you go http://hastebin.com/elesofimek.hs
21:27 glusterbot Title: hastebin (at hastebin.com)
21:30 JoeJulian vxitch: ok, how about a clean client log when mounting from the server that shows an empty volume.
21:31 JoeJulian Also, do gluster-1.local and gluster-2.local resolve correctly on both the servers and the client?
21:31 JoeJulian (I assume yes, but this is odd)
21:31 vxitch they do
21:31 vxitch I did a reboot
21:32 vxitch and rm -rf the /virtstore dir entirely
21:32 vxitch them imported zfs pool
21:32 vxitch then started the volume and waited
21:32 vxitch and somehow the data is back
21:32 vxitch this is all on brick2 by the way
21:33 vxitch I have no idea what gluster thought it was doing
21:33 vxitch but now it works
21:33 vxitch that was freaky
21:33 vxitch not the first reboot i've done since this started
21:33 vxitch and yet..
21:33 JoeJulian Well... um... congratulations.
21:33 vxitch yes
21:34 JoeJulian I'm really happy you got it working.
21:34 vxitch I'm very weary of trusting my VMs to something so fidgety
21:34 vxitch er
21:34 vxitch I am too, but just as confused
21:34 vxitch I supposed...pray it doesn't happen again
21:34 JoeJulian I wish we knew what was wrong...
21:34 vxitch yes
21:34 JoeJulian Could the filesystem have gone RO?
21:34 vxitch thanks for taking so much time to go in circles with me
21:34 vxitch nah, I checked that
21:35 JoeJulian I'm going to blame zfs... ;)
21:35 vxitch I could read/write/modify files on it during all this
21:35 vxitch I am as well
21:35 vxitch I don't know why or how, and the heal mechanism is not making me happy, but zfs is what would make sense. maybe.
21:35 JoeJulian Hey, thanks for sharing your solution. My media server is running gluster on zfs. Now I know one more thing to look for if that ever happens.
21:36 vxitch yeah, make doubly sure zfs imports into an empty dir
21:36 vxitch sigh
21:36 vxitch one day storage will not cost so much, and will be reliable to boot
21:36 vxitch :P
21:36 JoeJulian lol
21:37 JoeJulian Not if we keep trying to pack data in at the quantum level.
21:37 JamesG joined #gluster
21:38 bluenemo joined #gluster
21:38 JoeJulian Just make a quantum uncertainty receiver. Expect that in some parallel universe, someone is transmitting the data you need.
21:39 vxitch haha
22:07 DV joined #gluster
22:09 coreping joined #gluster
22:10 leucos joined #gluster
22:11 skylar joined #gluster
22:13 cabillman joined #gluster
22:13 shortdudey123 joined #gluster
22:14 atrius joined #gluster
22:16 frankS2 joined #gluster
22:16 bitpushr joined #gluster
22:17 vincent_vdk joined #gluster
22:18 malevolent joined #gluster
22:19 TheCthulhu joined #gluster
22:27 rotbeard joined #gluster
22:29 Ramereth joined #gluster
22:30 pdrakeweb joined #gluster
22:37 RedW joined #gluster
22:43 zhangjn joined #gluster
22:45 zhangjn joined #gluster
22:45 dgbaley joined #gluster
22:59 Slashman joined #gluster
23:04 skoduri joined #gluster
23:04 poornimag joined #gluster
23:10 dgbaley joined #gluster
23:12 suliba joined #gluster
23:17 David_Vargese joined #gluster
23:25 gildub joined #gluster
23:30 suliba joined #gluster
23:37 akay Leildin: if youre still around... MUCH better small file performance with samba vfs
23:45 skoduri_ joined #gluster
23:48 poornimag joined #gluster
23:50 Mr_Psmith joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary