Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-04-18

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 Matthaeus joined #gluster
00:12 Matthaeus joined #gluster
00:24 pdrakewe_ joined #gluster
00:27 yinyin- joined #gluster
00:47 dbruhn joined #gluster
01:02 davinder joined #gluster
01:13 nveselinov joined #gluster
01:14 dbruhn joined #gluster
01:19 jmarley joined #gluster
01:19 jmarley joined #gluster
01:21 japuzzo joined #gluster
01:31 yinyin joined #gluster
01:36 baojg joined #gluster
01:57 baojg joined #gluster
02:01 haomai___ joined #gluster
02:17 khchen joined #gluster
02:19 haomaiwang joined #gluster
02:21 seddrone joined #gluster
02:35 bharata-rao joined #gluster
02:35 suliba joined #gluster
02:36 suliba joined #gluster
02:44 Ark joined #gluster
02:57 jag3773 joined #gluster
03:01 nightwalk joined #gluster
03:05 cdez joined #gluster
03:06 _NiC joined #gluster
03:08 nveselinov_ joined #gluster
03:09 nveselinov__ joined #gluster
03:32 davinder joined #gluster
04:08 adam__11 joined #gluster
04:09 lalatenduM joined #gluster
04:10 adam__11 hi how can i bind multiple ip address for glusterfs, because of security reason, i need to limit bind-address, if i set a private ip like 10.x.x.x, it won't bind 127.0.0.1, but glusterfs always connect localhost, so client can not mount volume via glusterfs-fuse
04:11 adam__11 or how can i change the glusterfs connect to the private ip ?
04:11 adam__11 thank you
04:12 adam__11 i can not find the related things in docs
04:13 dbruhn adam__11, what is the output of gluster peer probe from your first and second server?
04:14 baojg_ joined #gluster
04:16 adam__11 E [socket.c:2788:socket_connect] 0-management: connection attempt failed (Connection refused)
04:16 adam__11 in etc-glusterfs-glusterd.vol.log
04:17 dbruhn so you are not able to peer probe?
04:17 adam__11 no, peer status is ok
04:17 adam__11 /usr/sbin/glusterfs -s localhost  in the glustershd.log
04:17 dbruhn agh sorry, I mistyped
04:17 adam__11 i think it always use localhost with glusterfs process
04:17 dbruhn what is the output of peer status
04:18 adam__11 peer status
04:18 adam__11 Number of Peers: 2
04:18 adam__11 Hostname: 192.168.0.x
04:18 adam__11 Uuid:xxxx
04:18 adam__11 State: Peer in Cluster (Connected)
04:18 adam__11 Hostname: 192.168.0.x
04:18 adam__11 Uuid: xxx
04:18 adam__11 State: Peer in Cluster (Connected)
04:19 dbruhn is it showing local host from any of the servers when you run that?
04:19 adam__11 yes, just glusterfs process is not running
04:20 dbruhn I'm not sure what you mean
04:22 adam__11 ok, if i bind 0.0.0.0, it will run glusterd glusterfsd glusterfs process, if i bind 192.168.x.x. just  glusterd glusterfsd  can run, glusterfs use localhost to connect gluster, but because it does not listen on 127.0.0.1 , so the glusterfs can not start, so i want to ask , how can i bind multiple address , like 192.168.x.x and 127.0.0.1
04:23 adam__11 or how can i change the behavior of glusterfs, let it use 192.168.x.x to connect gluster
04:25 dbruhn adam__11, use hostnames, and then you can connect however you need to.
04:25 dbruhn if you don't have dns, you can use your hosts file to manipulate directing things around
04:27 adam__11 thank you, i will try
04:28 dbruhn If your peers are all listed via peer status as ip addresses reprobe them until they all read hostnames
04:28 dbruhn gluster sends a manifest to the clients based on that information
04:28 chirino joined #gluster
04:28 dbruhn so if any are reading localhost it's going to try and connect to localhost
04:29 dbruhn hence why I was asking for that information from multiple servers in the cluster
04:30 baojg joined #gluster
04:34 Ark joined #gluster
04:59 wgao_ joined #gluster
05:00 dreville joined #gluster
05:01 Humble joined #gluster
05:11 adam__11 joined #gluster
05:16 rjoseph joined #gluster
05:25 hagarth joined #gluster
06:01 chirino joined #gluster
06:16 edward2 joined #gluster
06:17 Philambdo joined #gluster
06:22 andreask joined #gluster
06:24 ngoswami joined #gluster
06:30 vipulnayyar joined #gluster
06:39 caosk_kevin joined #gluster
06:39 haomaiwang joined #gluster
06:42 benjamin_____ joined #gluster
06:42 vimal joined #gluster
06:44 caosk_kevin hi all , mount gluster's replicated volume to client through NFS(not gluster native client), How to achieve high availability???
06:44 dbruhn caosk_kevin, most people use rrdns
06:45 haomaiwa_ joined #gluster
06:49 caosk_kevin dbruhn: thanks , how to achieve high concurrency, performance using NFS(not gluster native client)???
06:50 ndevos caosk_kevin: you can rrdns through all storage servers, different clients will then use different servers - distributed like that
06:51 ndevos caosk_kevin: you can also give each storage servers a virtual-ip, and migrate that (pacemaker or ctdb or ...) when that specific server goes offline
06:53 adam__11 @dbruhn, it did not work, i run  /usr/sbin/glusterfs -s localhost … manually and replace localhost with private ip, it worked
06:57 davinder2 joined #gluster
06:57 caosk_kevin ndevos: thanks , gluster can run well in solaris????? i install glusterfs server in solaris,but glusterfs-server'NFS not work, i can not mount gluster volume to client through NFS..
07:00 ndevos caosk_kevin: it may work on Solaris, but I'm pretty sure almost nobody runs and tests it on Solaris
07:01 ekuric joined #gluster
07:02 ndevos caosk_kevin: I would see the functioning on Solaris as 'experimental', you could file a bug with all details (versions, steps to reproduce, ..) and attract attention to it by sending an email to one of the mailinglists
07:02 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
07:03 ndevos caosk_kevin: maybe someone else has an interest in getting Gluster working on Solaris, I think there is the occasional mentioning, but nothing that I really paid attention to
07:05 caosk_kevin ndevos: right , thank you so much
07:08 ndevos caosk_kevin: no problem, I'd be interested to see some notes in a blog or mailinglist about your progress, and I'm sure others would appreciate that too (and may help out)
07:09 eseyman joined #gluster
07:11 caosk_kevin right , i will start this work, and report problems encountered, hope to need your help
07:12 ricky-ticky joined #gluster
07:14 ctria joined #gluster
07:22 nshaikh joined #gluster
07:26 haomai___ joined #gluster
07:31 Andyy2 joined #gluster
07:42 fsimonce joined #gluster
07:45 psharma joined #gluster
07:58 glusterbot New news from newglusterbugs: [Bug 1089172] MacOSX/Darwin port <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089172>
08:02 adam__11 left #gluster
08:06 andreask joined #gluster
08:14 ctria joined #gluster
08:33 chirino joined #gluster
08:48 Slashman joined #gluster
08:48 bala joined #gluster
08:58 ktosiek joined #gluster
08:59 ktosiek Hi! I seem to have problems with mounts that are not used for some time: they hang, log that server is not responding, reconnect, and only then work again. What might be that cause, and how do I debug something like this?
09:00 ktosiek The mount I'm talking about is only used for archiving WAL files from postgresql master, so it's touched only once in a few (5-30) minutes
09:04 chirino joined #gluster
09:13 ktosiek BTW is client (the FUSE one) connecting to all the servers when using replicated volumes?
09:21 caosk_kevin joined #gluster
09:29 pyqwer_ joined #gluster
09:51 qdk joined #gluster
09:58 glusterbot New news from newglusterbugs: [Bug 1089216] Meta translator <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089216>
10:07 haomaiwa_ joined #gluster
10:11 haomai___ joined #gluster
10:17 edward2 joined #gluster
10:30 vimal joined #gluster
10:38 haomaiwang joined #gluster
10:44 haomaiw__ joined #gluster
10:46 haomai___ joined #gluster
10:50 jiku joined #gluster
10:56 Ark joined #gluster
11:05 pvh_sa joined #gluster
11:28 baojg joined #gluster
11:40 jag3773 joined #gluster
11:45 glusterbot New news from resolvedglusterbugs: [Bug 1015990] Implementation of command to get the count of entries to be healed for each brick <https://bugzilla.redhat.co​m/show_bug.cgi?id=1015990> || [Bug 1002940] change in changelog-encoding <https://bugzilla.redhat.co​m/show_bug.cgi?id=1002940> || [Bug 1010874] Dist-geo-rep : geo-rep config log-level option takes invalid values and makes geo-rep status defunct <https://bugzilla.redha
11:51 Ark joined #gluster
11:59 ira joined #gluster
12:14 jmarley joined #gluster
12:14 jmarley joined #gluster
12:25 cdez joined #gluster
12:29 cyberbootje joined #gluster
12:34 plarsen joined #gluster
12:34 pyqwer_ joined #gluster
12:57 pyqwer_ Hi, I'm considering setting up a 2-node glusterfs array which should then provide storage for VMWare machines, hosted on seperate nodes. Has anyone experience with this?
12:58 pyqwer_ Currently we have a NAS-Box with RAID5 and want to increase reliability.
13:14 jmarley joined #gluster
13:14 jmarley joined #gluster
13:24 bala joined #gluster
13:34 lalatenduM joined #gluster
13:47 lmickh joined #gluster
13:48 rjoseph joined #gluster
13:55 dbruhn joined #gluster
14:03 wushudoin joined #gluster
14:06 gmcwhistler joined #gluster
14:08 primechuck joined #gluster
14:17 diegows joined #gluster
14:21 LoudNoises joined #gluster
14:33 mynameisdeleted joined #gluster
14:33 mynameisdeleted so.. gpfs vs lustre vs glusterfs...
14:33 mynameisdeleted have 12 3TB drives and 2 storage nodes
14:33 theron joined #gluster
14:33 mynameisdeleted want good read and write speed and guaranteed recovery in the event of any disk failure
14:34 mynameisdeleted also would like to be able to store directory structure and files less than 4KB in size on smaller solid state redundant array
14:34 mynameisdeleted so we  dont ask a platter drive to seek only to read 4KB
14:34 mynameisdeleted and we keep updatedb running fast
14:34 mynameisdeleted and find
14:37 baojg joined #gluster
14:41 pyqwer joined #gluster
14:41 pyqwer_ left #gluster
14:42 pyqwer Hi, has anyone a setup with gluster as a storage for VMWare hosts?
14:46 baojg joined #gluster
14:46 theron joined #gluster
14:53 dbruhn mynameisdeleted, have you tested with any of them yet? and are you planning on maintaining a separate volume for the smaller files
14:54 Guest40874 hello dbruhn
14:54 dbruhn hello Guest40874
14:55 Guest40874 so I found some interesting log entries from gluster this morning. I have a gluster volume mounted to 11 servers all pulling information out of the same file. One of the 11 servers network hung and I had to console in and restart it.
14:55 dbruhn pyqwer, I have seen some people using it via NFS for VMWare, but haven't done it myself
14:56 dbruhn Guest40874, what was in the log?
14:56 Guest40874 I got an info level around the time of the failure, I [server-helpers.c:729:server_connection_put] 0-vertica-load-server: Shutting down connection vertica-prod-6-40028-2014/03/13-00:3​7:18:676195-vertica-load-client-0-0  on the gluster server, and on the client no subvolumes upand transport endpoint not connected.
14:57 Guest40874 can i set the heapsize for the glusterfs fuse mount point somehow? I had issues with hdfs fuse mount and increasing the heap size of the fuse jvm fixed it
14:58 Guest40874 I [server-helpers.c:729:server_connection_put] 0-vertica-load-server: Shutting down connection vertica-prod-6-40028-2014/03/13-00:3​7:18:676195-vertica-load-client-0-0
14:58 lmickh joined #gluster
15:01 pyqwer dbruhn: I see, thanks for info - is this something that can be recommended? Is gluster a good choice for this scenario?
15:02 dbruhn pyqwer, there are a lot of large gluster installs, and I think it's becoming increasingly popular to use it under KVM. The question is a little too wide to give a direct answer
15:02 dbruhn Guest40874, Gluster isn't java.
15:03 Guest40874 dbruhn: that was a silly assumption on my part.
15:04 pyqwer dbruhn: O.k., thing is, we currently use a simple NAS box with NFS and ESXi Clients and want to improve availability cost-efficiently so that the system is up in case the NAS fails.
15:04 dbruhn Guest40874, from what it looks like the client probably suffered a network drop
15:04 dbruhn pyqwer, which NAS are you using today?
15:05 pyqwer A thecus with RAID5.
15:05 pyqwer What I don't understand ist the following: Can gluster directly export NFS? (seems so, right?) If yes, I assume I'd mount the NFS volume from the ESXi host. But what happens if the node dies?
15:05 pyqwer How would the switchover to the other node be done?
15:06 dbruhn Yes it can export NFS directly, and if a gluster server dies you would have to reconnect your vmhost to one of the other  gluster servers, or get the gluster server that died back online.
15:07 xathor ^^ This has been my experiance
15:07 dbruhn A lot of people also use rrdns for a poor mans load balancer, connection manager
15:07 pyqwer I see - so the switchover is not done automatically. And for load balancing, I'd try to distribute the clients on the nodes manually, right?
15:08 dbruhn pyqwer, if you are using the glusterfuse client failover management is managed by gluster, but in the case of NFS you have a single point to connect to.
15:08 pyqwer I see - but I estimate, this client is for Linux only, right?
15:08 dbruhn so if the server you mounted to goes away it's kind of what it is
15:08 dbruhn yep
15:09 pyqwer O.k. Some configurations/tutorials (like this here: http://myitnotes.info/doku.php?id=en​:jobs:linux_gluster_nfs_for_vmware) use pacemaker + corosync - but for what?
15:09 glusterbot Title: en:jobs:linux_gluster_nfs_for_vmware [IT Notes about: Juniper, Cisco, Checkpoint, FreeBSD, Linux, Windows, VmWare....] (at myitnotes.info)
15:10 pyqwer Won't gluster itself handle reconnection of a node?
15:11 dbruhn I haven't used pacemaker + corsync, so the only thing I could suggest is to test it and see how it works out for you.
15:11 pyqwer So it's optional, right?
15:12 dbruhn I guess it's up to you to determine if it's optional, it seems it might give you a functionality you are looking for
15:13 pyqwer I see - seems I have to dig into this a bit deeper as I don't really understand it.
15:14 dbruhn Which part are you having a hard time understanding?
15:14 pyqwer How the switchover with pacemaker would work.
15:15 pyqwer I estimate, that in some way the IP address of the dead node has to be given to a living one.
15:16 Guest40874 dbruhn: yes the network dies on the linux box and needs restarted, I was wondering if I am overloading the gluster mount point, any advise for me to look / read up on?
15:17 dbruhn Guest40874, try and resolve the network issue first. I could be wrong, but I am assuming it is the issue, not gluster or storage related.
15:17 Guest40874 ok thank you good sir
15:17 dbruhn I have had issues in the past with things like network manager doing weird things with my connections
15:17 dbruhn so I usually disable it, and make sure there is no DHCP anything running on my systems.
15:18 Slashman_ joined #gluster
15:20 jag3773 joined #gluster
15:20 dbruhn pyqwer, you'll have to forgive me as I am not super familiar with corosync or pacemaker, but to be honest these instructions looks fairly simple. Why not setup a test cluster in vm's and test it out, see if it does what you need it to.
15:20 Guest40874 dbruhn: ok thanks
15:21 pyqwer dbruhn: Thought about that, too. However, wanted to talk to someone first if this is a viable option.
15:21 dbruhn looks viable to me, and it looks like whoever wrote that has had it in production for over a year.
15:21 pyqwer dbruhn: O.k. Thanks for help!
15:22 dbruhn Of course, I would be really interested to see your results, and the community could always use another blog post about how to do these things. If you feel so inclined
15:23 pyqwer dbruhn: No problem, will do so (if I get it running). :-)
15:23 baojg joined #gluster
15:23 dbruhn I bet you will. I was actually thinking about trying to do something similar this morning, but using a pair of load balancers in the middle. Haven't had a chance to test it yet though.
15:32 lmickh joined #gluster
15:53 zaitcev joined #gluster
16:03 andreask joined #gluster
16:05 churnd joined #gluster
16:07 RameshN joined #gluster
16:14 Mo_ joined #gluster
16:26 lmickh joined #gluster
16:40 chirino joined #gluster
16:55 jbd1 joined #gluster
17:01 jag3773 joined #gluster
17:06 hagarth1 joined #gluster
17:13 chirino joined #gluster
17:34 primechuck joined #gluster
17:39 rotbeard joined #gluster
17:45 rotbeard joined #gluster
17:51 dbruhn @CTDB
17:51 dbruhn @ctdb
18:16 chirino joined #gluster
18:28 cfeller joined #gluster
19:00 edward1 joined #gluster
19:50 hagarth joined #gluster
19:50 sroy_ joined #gluster
20:12 ctria joined #gluster
20:14 chirino joined #gluster
20:25 JoeJulian @whatis ctdb
20:25 glusterbot JoeJulian: Error: No factoid matches that key.
20:26 JoeJulian @factoids search ctdb
20:26 glusterbot JoeJulian: No keys matched that query.
20:26 JoeJulian I thought I remembered a factoids about that too.
20:30 glusterbot New news from newglusterbugs: [Bug 1089414] Need support for handle based Ops to fetch/modify extended attributes of a file <https://bugzilla.redhat.co​m/show_bug.cgi?id=1089414>
20:32 jag3773 joined #gluster
20:35 dbruhn How do you tell the bot to learn something
20:35 dbruhn http://www.gluster.org/communit​y/documentation/index.php/CTDB
20:35 glusterbot Title: CTDB - GlusterDocumentation (at www.gluster.org)
20:37 dbruhn @learn ctdb http://www.gluster.org/communit​y/documentation/index.php/CTDB
20:37 glusterbot dbruhn: (learn [<channel>] <key> as <value>) -- Associates <key> with <value>. <channel> is only necessary if the message isn't sent on the channel itself. The word 'as' is necessary to separate the key from the value. It can be changed to another word via the learnSeparator registry value.
20:38 dbruhn @learn ctdb as http://www.gluster.org/communit​y/documentation/index.php/CTDB
20:38 glusterbot dbruhn: The operation succeeded.
20:38 dbruhn @whatis ctdb
20:38 glusterbot dbruhn: http://www.gluster.org/communit​y/documentation/index.php/CTDB
20:39 * jbd1 is annoyed that it took three clicks just to learn that "CTDB" stands for "Clustered Trivial Database"
20:40 dbruhn lol
20:40 dbruhn Update the wiki to be more meaningful to the holes you've found then ;)
20:41 jbd1 Good idea.  One nit I have with the GlusterFS documentation in general is that many parts of it assume that you've already read and understood many other parts.
20:42 dbruhn I think a lot of the questions we get in channel about best practices could be easily answered if the information was included on the wiki too
20:43 * jbd1 just discovered the 3.3 administrator guide after almost a year of administering it according to the 3.2 guide
20:44 dbruhn That 3.3 admin guide is a savior for quick reference for me.
20:44 dbruhn granted I am still actually running 3.3.x on my systems
20:45 jbd1 I plan to upgrade to 3.4 RSN but my fix-layout needs to finish first :)
20:46 dbruhn I have some systems on RDMA only and 3.4 in my limited testing isn't functioning with RDMA
20:46 chirino joined #gluster
20:46 dbruhn so I need to convert them over to tcp/rdma
20:47 dbruhn but that sounds daunting, and my lab equipment is in no place to do IB testing right now
20:49 CyrilP joined #gluster
20:50 jbd1 yeah, I've only seen negative stuff with RDMA/3.4 on the -users list
20:50 CyrilP Hi Gluster community !
20:51 CyrilP I posted an issue on the mailing list yesterday (http://supercolony.gluster.org/pipermai​l/gluster-users/2014-April/039979.html)
20:51 glusterbot Title: [Gluster-users] Conflicting entries for symlinks between bricks (Trusted.gfid not consistent) (at supercolony.gluster.org)
20:51 CyrilP Any hints ?
20:52 jbd1 Hah, I remember reading that and thinking, "I'm glad he posted this to the list-- it would be tough to debug on IRC!"
20:52 CyrilP ^^
20:53 dbruhn CyrilP, it looks like something is in split-brain
20:53 dbruhn those Input Output errors are very characteristic of it
20:53 CyrilP looks yes but no split brain reported
20:54 Matthaeus joined #gluster
20:55 CyrilP the point is self healing is failing only on 5 symlinks, trusted.gfid are difference on the 2 bricks, I was looking how to fix them
20:55 dbruhn have you tried to fix the file/dir anyway just to make sure
20:55 CyrilP what do you mean by fix the file/dir (there are the same on both side, only tusted.gfid are different)
20:56 dbruhn that will come up as a split-brain
20:56 dbruhn even if the files are the same
20:56 dbruhn because the attribute isn't the same
20:57 CyrilP ok, but gluster never report a split-brain issue
20:57 jbd1 right, So Joe Julian's split-brain stuff is handy here, or you can just blow away the symlinks on one brick and let self-heal do the fixing
20:57 CyrilP but I agree that the case
20:58 CyrilP I'll this for fixing, is this case will append every time we will have to down a server for maintenance ?
20:58 dbruhn it shouldn't unless something goes awry
20:58 CyrilP self healing as worked fine for others files
20:59 jbd1 unrelated: When debugging https://bugzilla.redhat.co​m/show_bug.cgi?id=1087960 on my systems, I found that the GFID file associated with a directory was a symlink, which pointed to another GFID symlink, about six levels deep, and finally pointed to a nonexistent GFID file under 00/00/00000000-0000-0000-0000-000000000001  -- not sure what that means
20:59 glusterbot Bug 1087960: unspecified, unspecified, ---, csaba, NEW , Client hangs when accessing a file, nothing logged
20:59 CyrilP I've seen this one, but not really our case
20:59 dbruhn 00000000-0000-0000-0000-000000000001 should link to ../../
21:00 jbd1 on my system, 00000000-0000-0000-0000-000000000001 didn't even exist
21:00 dbruhn sorry ../../..
21:00 dbruhn I have had it turn into a file and cause all sorts of havoc
21:02 jbd1 yeah that sounds bad
21:03 jbd1 I updated the ticket with my research.  I finally just deleted the whole directory on each brick and was finally able to avoid the client hang, but the issue should never have arisen to begin with
21:04 CyrilP Hmm interesting I just found that symlinks where linking the same file in both bricks
21:04 CyrilP it could explain the "split-brain" like issue
21:05 jbd1 now I'm worried that I don't have a 00000000-0000-0000-0000-000000000001 symlink in 00/00
21:05 CyrilP but It should have replicate....
21:06 jbd1 CyrilP: when your node was down, there were no changes to it, so you shouldn't have experienced split-brain at all, no?
21:06 CyrilP no split-brain reported
21:06 CyrilP only self-healing
21:07 CyrilP but it failed for few files
21:07 CyrilP (symlinks)
21:07 jbd1 CyrilP: even if no split-brain was reported, gfid mismatch means split brain
21:07 CyrilP to be clear, the remaining node was still active with r/w jobs from clients,
21:07 CyrilP when we brought up the down one, it start to replicate/heal files
21:08 CyrilP but it failed for some of them, and I try to figure out why...
21:08 CyrilP ok but... I just notice that symlinks are not "pointing" to the same file in the 2 bricks
21:09 CyrilP so the truted.gfid is different sound "normal"
21:09 jbd1 CyrilP: could the symlink have changed on the up nodes while the down node was out of the cluster?
21:09 dbruhn CyrilP, I would suggest pulling the file you feel is good, and then going through JoeJulian's split-brain blog entry and cleaning it form all bricks, and then put it back in place through the mount point
21:10 dbruhn form/from
21:10 CyrilP jbd1 sure it has,
21:12 CyrilP jbd1 the remaining brick was still used by users and building process to many files were created, updates, deletes on this brick
21:12 jbd1 CyrilP: then that is almost certainly the cause of the split brain.  You could probably test this in a lab environment by creating an example replicated volume, bringing down a node, replacing a symlink, and bringing the node back up.  I would expect the same issue to arise again
21:12 CyrilP All went fine for most of files except for theses symlinks which could have been created during healing process
21:13 CyrilP jbd1 that's the next step, reproduce it in vagrant-gluster env
21:14 CyrilP jbd1 but, how to avoid this issue again ? I mean, what is the best practice to down a node for maintenance and put in the cluster on up without creating split brain issue (avoid r/w is not a solution for our continous building system)
21:15 jbd1 CyrilP: If you can reproduce the issue in vagrant-gluster, you've found a bug with GlusterFS.  Report it and cross your fingers that they fix it ASAP
21:16 CyrilP ok great, I'll try that... I will fix the files by hand and see what's happening
21:16 jbd1 CyrilP: I definitely don't think you did anything wrong; your scenario is well within the established set of use cases for GlusterFS.
21:16 CyrilP great, I was afraid about that :)
21:17 jbd1 CyrilP: and the rest of us are vulnerable to the same thing happening if there's a bug.  I dread having to reboot or upgrade a node because of stories like this.
21:18 jbd1 Red Hat's recommended practices seem to ignore the fact that for most of their customers (and those of us who use the community edition) it's not feasable to stop using a volume entirely during maintenance
21:18 dbruhn I seem to remember entering a bug/feature request for a graceful shutdown at some point in time for bricks/servers
21:19 jbd1 dbruhn: the key for me would be the ability to tell GlusterFS to prefer brick A over brick B when brick B rejoins the cluster, so GlusterFS can automatically resolve split-brain issues after an outage
21:20 dbruhn https://bugzilla.redhat.co​m/show_bug.cgi?id=1025415
21:20 glusterbot Bug 1025415: unspecified, unspecified, ---, gluster-bugs, NEW , Enhancement: Server/Brick Maintenance Mode
21:20 jbd1 dbruhn: and a "graceful" shutdown would really just be that, plus don't attempt brick B after the shutdown completes.  Ah, yes!
21:20 dbruhn https://bugzilla.redhat.co​m/show_bug.cgi?id=1025411
21:20 glusterbot Bug 1025411: unspecified, unspecified, ---, gluster-bugs, NEW , Enhancement: Self Heal Selection Automation
21:20 dbruhn lol
21:21 CyrilP jbd1 one more thing, our second node had a time issue (12h offset)
21:21 CyrilP could it be a reason ?
21:21 dbruhn That should be resolved
21:21 dbruhn no matter what
21:21 jbd1 CyrilP: perhaps. If the time issue was a time zone issue, that would not cause it
21:23 jbd1 dbruhn: looks like your bugs are getting the standard treatment :(
21:23 dbruhn these things happen, might motivate me to learn to code one of these days ;)
21:24 jbd1 I write code every day, but that doesn't mean I want to hack on glusterfs
21:25 dbruhn I work on gluster every day, I still would like to be able to contribute.
21:27 edward1 joined #gluster
21:30 hagarth joined #gluster
21:31 dbruhn It's a community and for every person that puts something in, we all get something out, and more people are motivated to put in if others are. That's what sets great projects apart in my opinion.
21:35 dbruhn JoeJulian, are you out at Redhat summit?
21:42 CyrilP anyway, thanks jbd1, dbruhn for your help, I've fixed by hand my issue, I'll try to reproduce it on vagrant-gluster env
21:42 dbruhn CyrilP, good luck!
21:49 Georgyo joined #gluster
22:47 Ark joined #gluster
22:58 ctria joined #gluster
23:14 NCommander joined #gluster
23:14 NCommander Hey all, is there a trick to get gluster to listen to IPv6
23:16 chirino joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary