Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-02-05

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:01 haomaiwa_ joined #gluster
00:04 ctria joined #gluster
00:14 Logos01 So ... I've had a gluster server-pool in my production environment for a few weeks now, and I cannot seem to get it to be stable. I've switched over completely from that configuration, wiped out the pool, and am going back to the drawing board as it were. But I'm hoping to get some insight to a few things that I think might be involved in my environmental instability.
00:15 Logos01 1) What, if anything, would be the result of a floating IP address being migrated from one brick to another while active network traffic for gluster is going from clients to the brick through said floating IP?
00:15 JoeJulian Logos01: insanity.
00:15 Logos01 2) What, if anything, would be the result of a client accessing a large number of files on a replica volume (not distributed but replica only) if one of the bricks's OS instance were to be rebooted while stream was ongoing?
00:16 Logos01 JoeJulian: Could you go a little further than that? I suspect this is one of the major breakpoints for me but I'm not entirely sure.
00:16 JoeJulian What would happen is the traffic that was *supposed* to go to brick1 would be then trying to go to brick2. Hopefully they're on different ports or you could be writing two replicas to the same brick, and none to the other brick.
00:17 JoeJulian @mount server
00:17 glusterbot JoeJulian: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns, or (#2) One caveat is that the clients never learn of any other management peers. If the client cannot communicate with the mount server, that client will not learn of any volume changes.
00:17 JoeJulian So a gluster mount, by definition, is HA.
00:17 Logos01 JoeJulian: See, what I've seen tends to contradict that somewhat...
00:18 Logos01 How does the FUSE client handle stream disruptions?
00:18 JoeJulian I assume you're referring to a ,,(ping-timeout)
00:18 glusterbot The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
00:19 Logos01 JoeJulian: No I'm not referring to a ping timeout
00:19 Logos01 I'm referring to what happens if the network interface is suddenly disconnected while read/write operations are in progress.
00:20 Logos01 Like, the write is never completed. Or do all of these stream operations have sanity checks with ping tests of some sort?
00:20 JoeJulian After ping-timeout, that frame will be invalidated and the changes lost.
00:20 Logos01 And if it's a read event?
00:20 JoeJulian Your application will receive an error
00:21 JoeJulian If it's a read event, it will also ping-timeout and error.
00:21 Logos01 And that should show up in the FUSE client logs right?
00:21 JoeJulian ENOTCONN, iirc.
00:21 JoeJulian yes.
00:21 Logos01 DAmnit.
00:21 Logos01 That makes no sense.
00:21 JoeJulian Why? How should it get the data that's at the other end of a disconnected wire?
00:22 JoeJulian We don't have quantum tunneling networks yet.
00:22 Logos01 No I mean it makes no sense in context of what I've observed.
00:22 JoeJulian (operative word is *yet*!!!)
00:22 Logos01 As in, that makes it *harder* to explain what has been happening for me.
00:22 Logos01 Precisely because I've had my jboss servers behave as though the glusterfs volumes had suddenly gone offline for short periods
00:23 Logos01 But no errors ever show up in the logs -- either server nor client side.
00:23 JoeJulian Ah, I see where your frustration is.
00:23 Logos01 Simultaneously, I never see error events in the client logs when I reboot individual brick nodes.
00:24 JoeJulian No, a reboot shouldn't be an error event (I think). IIRC, no more than a warning.
00:24 Logos01 Before I was using gluster, the way I could reproduce this behavior would be by unmounting and then remounting the network share briefly.
00:24 Logos01 JoeJulian: Well you'd see the warning messages about not being able to communicate with the node and then be able to communicate with it again.
00:24 JoeJulian Right
00:25 Logos01 But what you're saying is that as long as *one* brick node is available, then there should *never* be anything remotely resembling a "umount fuse-point ; mount fuse-point" event.
00:25 Logos01 And yet ... that's exactly the behavior I'm seeing.
00:26 Logos01 Without any warning/info messages corresponding to all bricks being offline.
00:27 Logos01 JoeJulian: I'll add to the consternation here; while the fuse-clients never reported any of the brick-nodes being offline, at one point, each brick node showed itself to be disconnected from the others.
00:27 lanning how long is "short periods"?
00:27 Logos01 lanning: so short that I could not detect it by running ls ${file} from within the fuse-mount directory.
00:28 JoeJulian What error message are you getting in the application?
00:28 Logos01 I never saw any.
00:29 Logos01 That's part of the more infuriating part -- the jboss servers' logs are extremely chatty but I never saw anything indicative of errors.
00:29 JoeJulian What indication do you have that there's a problem?
00:29 JoeJulian Let'
00:29 Mr_Psmith joined #gluster
00:29 JoeJulian Let's start with the symptoms.
00:29 Logos01 Well, there's the part where the page stopped loading / and/or timed out during logins.
00:30 Logos01 And there's the part where when I went to restart the jboss instances they zombied out rather than shut down.
00:30 lanning packet loss?
00:31 Logos01 The only path I've ever had to reproducing that behavior has been unmounting the network share -- even re-mounting it *immediately* (as in -->  umount share ; mount share ) would produce this behavior.
00:32 Logos01 lanning: If there were packet loss the loadbalancer would have removed the jboss servers from the active-use pool dynamically; it uses icmp traffic for health checks.
00:32 lanning loss between the jbos server and the gluster servers
00:32 Logos01 I did however try pinging the machines involved at one point just to check for skips/jumps in packet return.
00:33 Logos01 lanning: In most cases the packets would never even have to hit the switch; they run on the same ESXi host.
00:33 Logos01 On the same VLAN and subnet.
00:33 mowntan joined #gluster
00:33 Logos01 This depends on the exact jboss server and gluster host -- there's 8 jboss OS instances and 3 gluster OSIs'.
00:33 mowntan joined #gluster
00:34 mowntan joined #gluster
00:34 lanning the volume is a replicated volume, correct?
00:34 Logos01 replica 3
00:34 mowntan joined #gluster
00:34 mowntan joined #gluster
00:34 lanning so every write goes out 3 times and the return doesn't happen until all 3 return
00:35 Logos01 Unless the bricks aren't in connected state, which *did* happen to me at one point.
00:35 lanning (the reason why having one replica far away is a bad idea)
00:35 Logos01 lanning: All of our hosts are in one of two racks in a datacenter.
00:35 Logos01 Those racks are directly adjacent.
00:36 Logos01 So's the switching equipment.
00:36 Logos01 (In the same racks that is)
00:36 Logos01 I'd say that's pretty close... <_<
00:36 lanning :)
00:36 Logos01 I mean, I could rub their faces in each other and whisper in creepy voice, "now keeeettthhhh"
00:36 Logos01 But, y'know. Short of giving me emotional satisfaction I don't think we'd get anywhere.
00:37 Logos01 So let's go back here.
00:38 Logos01 I was using the floating IP addr to allow the clients to have one hostname/IP to connect to because my environment isn't set up for round-robin DNS.
00:38 Logos01 The brick hosts always talked directly to one another.
00:39 Logos01 One thing that's a bit disconcerting is how at one point I reduced the volume to replica 2, removed one brick (same operation), detached that host, and then peer-probed its new IPaddr and hostname.
00:39 lanning as long as the bricks are defined by static addresses, you can have a floating address just for the initial config grab
00:39 Logos01 It kept picking up the old hostname which was no longer valid.
00:39 lanning I have done that.
00:40 Logos01 lanning: Well, the thing is, one of the times I managed to create problems was by rebooting the server that the fIP was attached to.
00:40 Logos01 I did see in the netstat of the clients that the fIP was being used for 24007 and the volume port.
00:40 Logos01 (There were actually three volumes, but all three servers hosted all three with the same configuration)
00:41 Logos01 I've got a lot to detangle here, this is an utter rats-nest.
00:41 lanning ya, a cluster is a single entity with 0 or more volumes
00:41 Logos01 Why would the bricks revert to the wrong hostname?
00:41 Logos01 Err, wait, that's wrong terminology
00:41 lanning cache somewhere
00:41 Logos01 Why would the *peers* revert to the wrong hostname?
00:42 Logos01 The problem is that the old hostnames were no longer valid; they were mapped to IP addresses that were unresponsive.
00:42 Logos01 So when that revert happened, the peer became detached.
00:42 lanning the server UUID
00:43 lanning you changed the hostname/IP but not the UUID
00:43 Logos01 Right, even though I did a "gluster peer detach" and then re-added it later?
00:44 Logos01 I must be missing something here; is there no clean path for hostname changes for a cluster-instance's brick hosts?
00:45 ovaistariq joined #gluster
00:45 lanning I have never tried...
00:45 Logos01 (My positive outcome here is mostly in the form of "you are an idiot, Logos01. You were doinitwrong and here's why"
00:45 Logos01 )
00:45 JoeJulian There is no clean path for changing a brick's hostname.
00:46 JoeJulian The one time I did it, I stopped the volume and did a "find -exec sed -i ..."
00:47 Logos01 JoeJulian: Yeah, once I saw that the hostnames kept reverting I went into the bricks -- and the clients -- and made /etc/hosts entries so at least the IP addresses of ${old_hostname} and ${new_hostname} would correlate.
00:47 JoeJulian From then on, I used shortnames and cnames to ensure I never had to do it again.
00:47 Logos01 Unfortunately, by that point the brick hosts were already in detached state and had been for hours.
00:48 Logos01 The peculiar thing about this is that the jboss applications never complained during this detached-state period.
00:48 JoeJulian btw... I've had a long-standing enhancement request to use the server uuid internally so changing hostnames would be simple.
00:49 Logos01 Well if my +1'ing it helps at all I'd be happy to.
00:49 harish joined #gluster
00:49 JoeJulian bug 765437
00:49 glusterbot Bug https://bugzilla.redhat.com​:443/show_bug.cgi?id=765437 low, medium, ---, bugs, ASSIGNED , [FEAT] Use uuid in volume info file for servers instead of hostname or ip address
00:49 lanning rename brick host: 1) wipe host 2) re-install... :)
00:49 JoeJulian Hehe
00:50 JoeJulian sed -i is much easier.
00:53 Logos01 Long-standing as in four years old plus change.
00:53 Logos01 Nice.
00:54 theron joined #gluster
00:56 * Logos01 grumbles once again about his total inability to replicate the workload of his production environment *ANYWHERE*.
00:56 JoeJulian Been there, done that.
00:56 JoeJulian Hell, I'm still doing that.
00:56 Logos01 I need a way to emulate the highly parallel access and writing of large numbers of small files or else I'm never gonna be able to go back to using gluster in my environment I think.
00:56 Logos01 I completely wiped it out. I'm seriously using sshfs on a standalone server now.
00:57 JoeJulian I even have the gear where I *could* have duplicated what I'm doing, but nobody wanted to.
00:57 JoeJulian You know, even CERN uses gluster for doing essentially what you're describing. It can be done.
00:58 Logos01 The problem is that I can't demonstrate that anymore without the risk of taking down my production environment again.
00:58 plarsen joined #gluster
00:58 Logos01 It's happened pretty much every day for the last week.
00:59 Logos01 So ... without that evidence of reliability, there's no way I can recommend it again.
00:59 Logos01 This irritates me because it *should* be simple enough to make work correctly.
00:59 JoeJulian Well, capture all the logs you can and share anything you have a question about.
01:00 Logos01 Yeah, I dunno, everything I've read -- and what you've reaffirmed -- is contradicting what I've seen.
01:00 JoeJulian Btw... I *have* seen similar to what you're describing before, and it was the network.
01:00 JoeJulian but who knows.
01:00 Logos01 Hrm.
01:00 lanning do you have bonded ethernet ports?
01:01 Logos01 All of the OSI's I have referred to are VMware VMs
01:01 Logos01 All of my hypervisors are HP blades in one of three chassis.
01:01 haomaiwang joined #gluster
01:01 lanning I am wondering about out of order packets.
01:01 lanning not detectable via ping
01:01 glusterbot lanning: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
01:01 Logos01 All chassis have aggregated link connections to multiple CISCO switches, and aggregated fiber channel connections to SAN backend.
01:02 lanning all LACP?
01:02 Logos01 Yes.
01:02 JoeJulian I need to find the cpu failure I read about once. West coast admin with a server in an east coast DC. He figured out the reason he had issues was one of his cpu cores was going bad (one core out of 4, but dual processors). He figured out a way to just disable that one core until he could get the cpu replaced.
01:02 JoeJulian Wierd things happen sometimes.
01:03 Logos01 Well, no, you actually hit a live wire here for me.
01:04 Logos01 We had a recent storage disruption event that was involving communication between the SAN And the ESXi hosts ...
01:04 Logos01 Two in half an hour's time, each less than a second.
01:04 lanning there are things like this: http://mina.naguib.ca/blog/2012/10/22/th​e-little-ssh-that-sometimes-couldnt.html
01:04 glusterbot Title: The little ssh that (sometimes) couldn't - Mina Naguib (at mina.naguib.ca)
01:04 Logos01 FC and ethernet switching is being handled by the same equipment.
01:05 Logos01 JoeJulian: You said you've seen similar to this before and it "was the network"
01:05 Logos01 How did you come to that diagnosis?
01:06 JoeJulian tcpdump from both ends.
01:06 JoeJulian wireshark
01:06 Logos01 In the midst of a disruption?
01:06 JoeJulian wireshark does know how to decode gluster rpc, btw.
01:06 Logos01 Or just in ordinary state.
01:07 JoeJulian ordinary state had lots of retries and lost packets. Clearly evident when I looked at the traces.
01:07 JoeJulian We were replacing the network equipment anyway, so we did and the problem went away.
01:07 Logos01 Hrm.
01:08 Logos01 I've had to move off of using it so now I'm back at the same problem.
01:08 Logos01 But at least I have another candidate for explanation.
01:08 JoeJulian btw... thanks ndevos, again, for the gluster decoding.
01:08 JoeJulian It's always nice to be able to blame the network guys.
01:08 Logos01 I guess I could build another gluster pool and (I know, right?) start shoving some dd onto loopfiles.
01:09 lanning or, the worst gluster fuse use case... PHP...
01:14 JoeJulian An even worse subset of php problems: disk based session data.
01:20 Mr_Psmith I am currently running gluster on 1gbe; Is gluster RDMA (infiniband) a stable thing or should I spend money on 10 gbe instead?
01:20 gildub joined #gluster
01:22 Logos01 JoeJulian: ... that's *WHAT* I have.
01:22 Logos01 disk based session data.
01:22 Logos01 :(
01:22 JoeJulian Session data should be in redis or memcached.
01:22 Logos01 Yes, yes it should.
01:23 Logos01 100% agree.
01:23 JoeJulian Mr_Psmith: I know people that have used it successfully for years. Others have had nothing but trouble. It usually comes down to hardware and drivers.
01:23 Logos01 And I am almost daily saying exactly as much to the developer(s) who maintain this codebase.
01:25 JoeJulian It can be done without changing the codebase if they're using the php tools for session management. The only changes necessary are in the php configuration.
01:27 Logos01 JoeJulian: It's java / jboss, not php
01:27 Logos01 But it's disk-based session data.
01:31 JoeJulian Logos01: btw.. did you know that there's a jni for gluster?
01:32 JoeJulian https://github.com/semiosis/libgfapi-jni
01:32 glusterbot Title: GitHub - semiosis/libgfapi-jni: Java Native Interface (JNI) bindings for libgfapi (the GlusterFS client API) (at github.com)
01:40 JoeJulian Oooh, lanning. You weren't one of the many that got axed for the certificate fiasco. I bet that was interesting.
02:01 haomaiwa_ joined #gluster
02:04 hagarth joined #gluster
02:09 dlambrig joined #gluster
02:12 nishanth joined #gluster
02:18 baojg joined #gluster
02:28 theron joined #gluster
02:29 haomaiwa_ joined #gluster
02:46 lanning JoeJulian: Nope. I was dismantling the datacenters that I built, before they even went into production. Saw the writing and left before the whole project went under.
02:48 ilbot3 joined #gluster
02:48 Topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
02:49 harish joined #gluster
03:01 haomaiwang joined #gluster
03:10 shyam joined #gluster
03:23 mowntan joined #gluster
03:23 mowntan joined #gluster
03:23 mowntan joined #gluster
03:39 Bhaskarakiran joined #gluster
03:45 sakshi joined #gluster
03:48 JoeJulian Oh! Man! I totally called that 10 years ago! The best part is if it's not covered by hippaa rules you can totally corrolate data and discover diagnostic trends.
03:56 ahino joined #gluster
03:57 nbalacha joined #gluster
03:59 Logos01 JoeJulian: No I did not know that, and it's quite interesting, though not necessarily the best fit for my org. I'll have to look into it though.
04:01 shubhendu joined #gluster
04:02 JoeJulian Just put some machine learning around the data and it should be a no-brainer.
04:02 JoeJulian (because machine learning is easy ;) )
04:03 baojg joined #gluster
04:04 gem joined #gluster
04:07 itisravi joined #gluster
04:10 Bhaskarakiran joined #gluster
04:11 baojg joined #gluster
04:12 dusmantkp_ joined #gluster
04:18 rafi joined #gluster
04:22 kanagaraj joined #gluster
04:23 David_Varghese joined #gluster
04:27 20WAAA8VC joined #gluster
04:35 skoduri joined #gluster
04:39 baojg joined #gluster
04:40 Manikandan joined #gluster
04:43 shubhendu joined #gluster
04:51 atinm joined #gluster
05:01 haomaiwa_ joined #gluster
05:02 rcampbel3 joined #gluster
05:02 ndarshan joined #gluster
05:05 hgowtham joined #gluster
05:09 gowtham joined #gluster
05:19 aravindavk joined #gluster
05:24 poornimag joined #gluster
05:25 monotek joined #gluster
05:26 arcolife joined #gluster
05:26 MACscr|lappy joined #gluster
05:27 vmallika joined #gluster
05:30 ppp joined #gluster
05:36 Apeksha joined #gluster
05:40 ramky joined #gluster
05:46 karnan joined #gluster
05:48 ashiq joined #gluster
06:00 atalur joined #gluster
06:01 vimal joined #gluster
06:01 haomaiwa_ joined #gluster
06:01 kovshenin joined #gluster
06:02 nishanth joined #gluster
06:10 RameshN_ joined #gluster
06:11 skoduri joined #gluster
06:23 shubhendu joined #gluster
06:28 unlaudable joined #gluster
06:30 Saravanakmr joined #gluster
06:35 RameshN_ joined #gluster
06:39 nehar joined #gluster
06:43 sakshi joined #gluster
06:46 atalur joined #gluster
06:48 dusmantkp_ joined #gluster
06:58 SOLDIERz joined #gluster
06:59 mhulsman joined #gluster
07:00 mhulsman1 joined #gluster
07:01 haomaiwa_ joined #gluster
07:03 ramky joined #gluster
07:11 David_Varghese joined #gluster
07:16 baojg joined #gluster
07:32 ctria joined #gluster
07:35 jtux joined #gluster
07:36 nishanth joined #gluster
07:40 unlaudable joined #gluster
07:42 [Enrico] joined #gluster
07:56 baojg joined #gluster
08:01 haomaiwa_ joined #gluster
08:06 aravindavk joined #gluster
08:06 robb_nl joined #gluster
08:14 nehar joined #gluster
08:21 ramky joined #gluster
08:22 fsimonce joined #gluster
08:29 [diablo] joined #gluster
08:32 auzty joined #gluster
09:01 haomaiwang joined #gluster
09:02 EinstCrazy joined #gluster
09:13 kshlm joined #gluster
09:14 baojg joined #gluster
09:17 PaulCuzner left #gluster
09:25 shubhendu joined #gluster
09:36 baojg joined #gluster
09:40 mhulsman joined #gluster
09:43 mhulsman1 joined #gluster
09:45 JesperA joined #gluster
09:46 dusmantkp_ joined #gluster
09:56 ramky joined #gluster
09:57 ctria joined #gluster
10:01 haomaiwa_ joined #gluster
10:03 Slashman joined #gluster
10:16 baojg joined #gluster
10:21 unlaudable joined #gluster
10:22 EinstCrazy joined #gluster
10:26 shubhendu joined #gluster
10:26 unlaudable joined #gluster
10:27 aravindavk joined #gluster
10:31 unlaudable joined #gluster
10:34 karnan joined #gluster
10:36 EinstCrazy joined #gluster
10:37 saltsa joined #gluster
11:01 haomaiwa_ joined #gluster
11:03 harish_ joined #gluster
11:05 harish_ joined #gluster
11:08 post-fac1um joined #gluster
11:10 ccha2 joined #gluster
11:12 itisravi_ joined #gluster
11:12 edong23_ joined #gluster
11:13 tru_tru joined #gluster
11:13 yalu joined #gluster
11:16 sac joined #gluster
11:17 klaxa joined #gluster
11:17 p8952 joined #gluster
11:17 xMopxShell joined #gluster
11:19 msvbhat joined #gluster
11:20 social joined #gluster
11:22 post-factum joined #gluster
11:25 shubhendu joined #gluster
11:26 jtux joined #gluster
11:41 robb_nl joined #gluster
11:45 dataio joined #gluster
11:48 ackjewt joined #gluster
11:51 baojg joined #gluster
11:52 ramky joined #gluster
12:01 haomaiwa_ joined #gluster
12:05 luizcpg joined #gluster
12:08 mowntan joined #gluster
12:14 Wizek joined #gluster
12:14 Wizek Hello!
12:14 glusterbot Wizek: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
12:26 baojg joined #gluster
12:26 kshlm joined #gluster
12:29 EinstCrazy joined #gluster
12:44 baojg joined #gluster
13:00 [diablo] afternoon guys.. anyone using the RHGS please? ... trying to find other users, got a problem with the CTDB SMB sharing
13:00 ira joined #gluster
13:01 haomaiwa_ joined #gluster
13:02 baojg joined #gluster
13:08 ramky joined #gluster
13:18 unclemarc joined #gluster
13:24 shubhendu joined #gluster
13:27 Mr_Psmith joined #gluster
13:29 fgd joined #gluster
13:30 fgd Hi everyone! I'm having issues with two node gluster setup and I'm currently lost :( Can anyone help?
13:33 rafi1 joined #gluster
13:35 post-factum fgd: any description of your issue?
13:36 gowtham joined #gluster
13:43 fgd post-factum: Indeed. The two nodes originally had one brick each, with 3 volumes using them in a replicated setup. Afterwards an additional brick was added to each node an attached to one of the volumes. Then rebalance was started, which failed a few times before I upgraded to the latest version of Gluster. The last rebalance job was going well until one of the nodes became unresponsive and was rebooted. Now, whenever I start the gluster daemon on this no
13:43 fgd de, all IO stops on clients mounting the volumes.
13:45 fgd post-factum: the only way to have the mounts working is to have only the second node running, yet this results in very high IO and CPU load.
13:49 post-factum any logs from failing node and stalled client?
13:52 fgd let me get some
13:53 jwang_ joined #gluster
13:54 plarsen joined #gluster
13:54 julim joined #gluster
14:01 haomaiwa_ joined #gluster
14:09 dlambrig joined #gluster
14:16 jmarley joined #gluster
14:16 jmarley joined #gluster
14:25 Bhaskarakiran joined #gluster
14:29 baojg joined #gluster
14:35 fgd post-factum: I was able to piece something together: http://k00.fr/ga82z
14:35 glusterbot Title: Download link for gluster_logs (at k00.fr)
14:43 kenansulayman joined #gluster
14:45 robb_nl_ joined #gluster
14:45 B21956 joined #gluster
14:46 baojg joined #gluster
14:50 kshlm joined #gluster
14:50 hamiller joined #gluster
14:54 skylar joined #gluster
14:56 post-factum fgd: i see versionong mess in your log (3.7.3 vs 3.7.6)
14:56 post-factum s/versionong/versioning/
14:56 glusterbot What post-factum meant to say was: fgd: i see versioning mess in your log (3.7.3 vs 3.7.6)
14:57 post-factum fgd: have you updated *all* glusterfs packages on all systems to 3.7.6?
14:57 Wizek joined #gluster
14:59 haomaiwa_ joined #gluster
15:01 David_Varghese joined #gluster
15:02 fgd post-factum: storage nodes show the same version for glusterfs-server ( 3.7.6-ubuntu1~trusty1)
15:02 fgd post-factum: clients were not updated, though
15:04 post-factum fgd: make sure you have the same versions for the whole cluster including clients
15:05 fgd post-factum: on it :)
15:06 fgd post-factum: my plan is to backup the files on the failing node/bricks, then wipe them and add them back to the volumes. Is this viable?
15:07 post-factum fgd: i see no reason to do so, at least, for now
15:09 saltsa joined #gluster
15:14 gem joined #gluster
15:17 fgd post-factum: you think the version missmatch is at fault?
15:20 post-factum fgd: cannot state that unconditionally, but i've faced some weird issues during cluster update while versions were not the same
15:25 baojg joined #gluster
15:30 coredump joined #gluster
15:33 haomaiwa_ joined #gluster
15:44 neofob joined #gluster
15:47 gem joined #gluster
15:58 farhoriz_ joined #gluster
16:01 haomaiwa_ joined #gluster
16:03 frakt_ joined #gluster
16:03 shubhendu joined #gluster
16:05 Iouns_ joined #gluster
16:06 mhulsman joined #gluster
16:08 jmarley joined #gluster
16:08 jbrooks joined #gluster
16:10 coredump joined #gluster
16:11 wushudoin joined #gluster
16:14 fgd post-factum, Thanks for the help. I will try to run the second node over the weekend and test how it performs (all clients now being the same version).
16:14 fgd post-factum: would you say that the plan with attaching clean bricks could save the problem?
16:15 Akee joined #gluster
16:16 farhoriz_ joined #gluster
16:17 baojg joined #gluster
16:18 rossdm joined #gluster
16:19 jwd joined #gluster
16:21 neofob joined #gluster
16:22 nickage_ joined #gluster
16:22 shubhendu joined #gluster
16:25 Manikandan joined #gluster
16:27 fubada JoeJulian: I need to sync 22GB of small files between 2 gluster clusters, a 2x2 v 3.6 and a 1x2 v 3.7.4. Is my best option to use mountpoints on a client machine and rsync?
16:33 baojg joined #gluster
16:33 theron joined #gluster
16:37 tswartz fubada, if it's one sync it won't be so bad. if it's a continuous thing, you are going to have performance issues most likely
16:42 rcampbel3 joined #gluster
16:42 fubada tswartz: yah just a one time migration thing
16:42 fubada trying to ditch the last volume off my 3.6
16:43 tswartz should be good then, i didn't have any issue with 50gb of small files
16:43 fubada another way would be to add my 3.6 as a brick under that volume in new 3.7?
16:43 fubada then remove-brick>?
16:44 baojg joined #gluster
16:47 tswartz i would imagine yes. i'm no expert on gluster yet :)
16:55 JoeJulian I would probably tar rather than rsync, myself.
16:56 JoeJulian tar+nc is faster than rsync if you don't need the encryption.
16:59 fubada thanks JoeJulian
16:59 fubada and adding a brick and then removing after the sync..bad idea?
17:00 fubada JoeJulian: i also dont need nc as I can mount both on the same host
17:00 fubada is rsync best in that case?
17:01 haomaiwa_ joined #gluster
17:02 bennyturns joined #gluster
17:02 overclk joined #gluster
17:06 jmarley joined #gluster
17:06 squizzi_ joined #gluster
17:06 David_Varghese joined #gluster
17:10 JoeJulian Sure, rsync is just cp at that point, though I wish you could adjust the block sizes it reads/writes. iirc it's 512 bytes. Not very network efficient.
17:10 muneerse2 joined #gluster
17:11 kshlm joined #gluster
17:15 klaxa joined #gluster
17:15 coredump joined #gluster
17:24 brettster55 joined #gluster
17:26 brettster55 Hey everyone. I'm being blocked by a NAT gateway trying to pull down GlusterFS. What settings/files do I need to alter in the client and server in order to use username/password auth rather than IP based auth?
17:28 JoeJulian I don't think you can. I think ssl is your only option. http://gluster.readthedocs.org/en/latest/​Administrator%20Guide/SSL/?highlight=ssl
17:28 glusterbot Title: SSL - Gluster Docs (at gluster.readthedocs.org)
17:29 brettster55 I'm referencing an old post here that says you can, https://www.gluster.org/pipermail/g​luster-users/2008-July/000100.html
17:29 glusterbot Title: [Gluster-users] Mounting glusterfs through NAT? (at www.gluster.org)
17:31 JoeJulian Wow... has it really been more than 8 years?!?!
17:32 brettster55 haha I tried to follow those instructions and got no where. Still blocked
17:32 JoeJulian So yeah, that's completely irrelevant with the current code base.
17:32 JoeJulian ssl is your only option.
17:33 JoeJulian And, you do know that your client will need to be able to resolve and connect to all the servers in your volume, yes?
17:34 overclk joined #gluster
17:34 brettster55 the client has no issue resolving server names, hitting the internet etc
17:34 JoeJulian Ok, cool.
17:35 JoeJulian So now that I've said it isn't possible... there's a chance it might be.
17:36 JoeJulian You couldn't mount using the mount command or through fstab, but if you added options to the gluster command line, you might be able to use the username and password options to the client translator as specified in the volume info file.
17:37 JoeJulian So on your server in /var/lib/glusterd/vols/$volname/info there's a username and password (2 uuids)
17:38 brettster55 i see it
17:38 brettster55 q
17:39 brettster55 how do I supply those creds in gluster CLI
17:39 brettster55 from the client
17:39 JoeJulian Theoretically if you mounted your volume using the glusterfs command line, you should be able to add those options like "--xlator-option *client*.username=$username --xlator-option *client*.password=$password"
17:42 brettster55 let me try
17:42 cpetersen joined #gluster
17:47 brettster55 @JoeJulian can you give me an example for how to mount with those options? I've only used fstab
17:47 JoeJulian Just look at ps for a successful mount and add those switches.
17:49 brettster55 I don't have a successful mount with the NAT gateway in place
17:49 hagarth joined #gluster
17:52 cpetersen_ joined #gluster
17:53 cpetersen_ hey JoeJulian
18:01 64MAAZ7VE joined #gluster
18:02 brettster55 JoeJulian, does this look correct. I'm getting errors with it. "$ sudo mount -t glusterfs us-west-1-gluster.testserver.com:/vol1 /mnt/glusterfs --xlator-option *client*.username=wdadwa --xlator-option *client*.password=123"
18:04 JoeJulian You can't use the mount command.
18:04 JoeJulian Mount the volume on one of the servers and look at ps
18:05 JoeJulian And in case I miscommunicated, you don't change the username and password in the info file, you have to use the values that were already in there.
18:06 JoeJulian I'm sure you were just throwing in fillers, but I just wanted to make sure I wasn't confusing things.
18:06 brettster55 yea, I'm on the server, I used ps aux and I see the command used
18:08 brettster55 do I use that same command from ps aux and use it on the client, I guess i'm confused
18:08 JoeJulian Yes, adding those xlator-options.
18:11 brettster55 okay I took the command from the server (ps aux), added the --xlator-options to it using the creds from the info file, ran it on client, and it did not mount
18:11 nbalacha joined #gluster
18:12 JoeJulian Bummer. Check the client log to see if there's any clues, but it was just a theory.
18:15 jmarley joined #gluster
18:18 brettster55 it can't find the pidfile /var/lib/glusterd/vols/vol1 open failed [No such file or directory]
18:18 brettster55 that file is on the server, not client
18:20 DV joined #gluster
18:27 PaulCuzner joined #gluster
18:32 dgandhi joined #gluster
18:32 JoeJulian Looks like you found a glusterfsd process, not a glusterfs process.
18:33 dgandhi joined #gluster
18:33 zerick joined #gluster
18:36 kshlm joined #gluster
18:38 siel joined #gluster
18:39 zerick joined #gluster
18:42 cliluw joined #gluster
19:01 haomaiwa_ joined #gluster
19:18 mhulsman joined #gluster
19:18 jwaibel joined #gluster
19:21 jwd joined #gluster
19:27 zerick joined #gluster
19:37 Guest69439 joined #gluster
19:41 brettster55 what is the diff between glusterfs and glusterfsd
19:57 JoeJulian @processes
19:57 glusterbot JoeJulian: The GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal).
20:01 haomaiwa_ joined #gluster
20:02 zerick joined #gluster
20:02 cholcombe joined #gluster
20:11 robb_nl_ joined #gluster
20:14 Melamo joined #gluster
20:29 cpetersen joined #gluster
20:48 coredump joined #gluster
20:50 theron joined #gluster
21:01 haomaiwa_ joined #gluster
21:01 klaxa joined #gluster
21:03 cliluw joined #gluster
21:06 cliluw joined #gluster
21:25 raghu joined #gluster
21:25 cpetersen If I have a heal-failed entry following a staged failure of a single node, how do I recover from this?
21:26 plarsen joined #gluster
21:32 ovaistariq joined #gluster
21:36 JoeJulian It should recover automatically on the next run.
21:38 cpetersen yah I just put in the heal info command again and it was done :)
21:38 cpetersen so JoeJulian
21:38 cpetersen I have a deeply concerning question
21:39 cpetersen I went over to clusterlabs and dude told me to not do any recovery testing until I have fencing setup
21:39 cpetersen I was thinking of not fencing
21:39 cpetersen what say you
21:43 JoeJulian Gluster has it's own tools for that, server and/or volume quorum.
21:44 cpetersen I have my quorum-type set to auto on the volume.
21:44 cpetersen Is that enough?
21:44 JoeJulian yes
21:44 JoeJulian though server quorum is valuable as well.
21:44 cpetersen How does it work?
21:45 JoeJulian There may be some value to fencing in general, but only for things that are outside of gluster's control, ie. overheating, security faults, etc.
21:45 JoeJulian server quorum: if a server cannot see enough peers, it stops serving its bricks until it regains quorum.
21:46 cpetersen But with only 3 peers total, is that still valuable?
21:46 JoeJulian Usually
21:47 cpetersen I will have to do some reading on Quorum.
21:47 ctria joined #gluster
21:47 cpetersen Did you have a chance to play with Ganesha yet?
21:48 JoeJulian I use ganesha at home. I don't use the gluster configuration integration with it yet.
21:48 JoeJulian I just manually set up my volumes.
21:52 cpetersen Ohhhh ok I see!
21:56 post-factum i played with ganesha a little bit
21:56 post-factum found memory leak there too, and devs fixed it :)
21:56 post-factum was dancing around HA with ganesha and keepalived
22:01 haomaiwang joined #gluster
22:02 cpetersen Post-factum, ever seen this error?
22:02 cpetersen http://ur1.ca/ohhk4
22:02 glusterbot Title: #319177 Fedora Project Pastebin (at ur1.ca)
22:03 post-factum cpetersen: nope, never played with pacemaker
22:04 post-factum i've managed ganesha ha with keepalived + hand-crafted scripts around dbus
22:04 JoeJulian Have you asked the pacemaker users?
22:05 JoeJulian (I don't know if they have an IRC channel)
22:06 cpetersen No I have not.  Perhaps I shall.
22:06 cpetersen Ah I lied.
22:06 cpetersen Clusterlabs are the pacemaker guys.
22:07 cpetersen They told me to do fencing before we could really troubleshoot the error.
22:07 cpetersen Because he seen that I had STONITH disabled...
22:08 JoeJulian hagarth: ^
22:17 hagarth JoeJulian: kkeithley might be able to help with this better
22:17 hagarth cpetersen: if you need more assistance, please drop a note on gluster-users.
22:18 JoeJulian Ah, right... It must be Friday.
22:19 post-factum Friday all the evening
22:19 cpetersen I submitted a message to the ganesha dev mailing list as kkeithley is traveling.  Thanks guys.  :)]\
22:28 cpetersen_ joined #gluster
22:38 uebera|| joined #gluster
22:42 coredump joined #gluster
22:46 cpetersen_ Man oh man I hate split-brain...
22:47 cpetersen_ This is insane, I simulate one failure and I split-brain.
22:48 cpetersen_ I shut the network ports for one of my nodes, then 10 mins later brought it back.
22:48 cpetersen_ Is there anything I need to do beyond just turning everything back on to re-introduce the node in to the cluster?
22:50 cpetersen_ I feel like that wasn't a real failure.  I should have shut the whole server down.  I introduced that split-brain.
22:50 cpetersen_ agh
23:01 haomaiwa_ joined #gluster
23:45 plarsen joined #gluster
23:52 theron joined #gluster
23:57 JoeJulian cpetersen_: If you had quorum enabled, I don't know you you could have gotten split-brained.
23:57 cpetersen_ I have the quorum type set.
23:57 cpetersen_ Does that mean it's enabled?
23:59 zerick joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary