Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2015-06-08

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:02 sonicrose joined #gluster
00:09 tessier glusterfsd won't start on any of my machines and I have no idea why. :(
00:10 tessier Been this way for a few weeks. The gluster project is rather stalled because of it.
00:10 tessier Nothing appears in the logs or anything.
00:11 tessier When I do a gluster peer probe I always get "peer probe: failed: Probe returned with unknown errno 107"
00:11 tessier Probably because glusterfsd isn't running
00:41 Prilly joined #gluster
01:21 Jandre joined #gluster
01:40 wkf joined #gluster
01:41 nangthang joined #gluster
02:14 JoeJulian tessier: glusterfsd is the brick server process. It's started by glusterd (the management daemon) when your volume is started. Error number 107 is "Transport endpoint is not connected" which usually means either glusterd is not running, or the port is firewalled.
02:26 Prilly joined #gluster
02:38 Hamcube New problem now - after getting 3.7.1 installed between CentOS 7 and Arch Linux - I had tried to run a volume rebalance.
02:38 Hamcube Now all commands I try are causing the gluster CLI to hang
02:39 JoeJulian Any clues in cli.log or etc-glusterfs-glusterd.vol.log?
02:40 Hamcube I'm installing fpaste now so I can kick those over.
02:40 Hamcube I can't make sense of the logs
02:42 Hamcube http://paste.fedoraproject.org/229808/31303143
02:42 Hamcube Looks like a stack trace to me
02:43 JoeJulian it is. glusterd crashed
02:44 Hamcube This happens on both the CentOS 7 and Archlinux machine. Similar looking logs.
02:44 Hamcube Repeatable by restarting glusterd and trying to invoke a command from the CLI
02:44 JoeJulian Why does it think two servers are both on the same ip address?
02:44 Hamcube Lemme look at that
02:44 Hamcube Oh that is interesting
02:44 Hamcube Not sure why that would be. Two bricks on the same machine
02:45 Hamcube Well, not on that machine.., so no
02:47 Hamcube eee900(192.168.0.30) has two bricks on the same machine, rampage(192.168.0.20) and gluster(192.168.0.67) all have single bricks
02:47 Hamcube x.x.x.67 is a VM I was using to try adding and removing a brick from gluster
02:48 Hamcube I had some data between the 3 bricks on .20 and .30, ran a rebalance to get some of it on .67
02:48 Hamcube And I, within seconds, ran volume status to see how it was going and then this happened
02:48 Hamcube Several restarts/reboots later, the problem persists.
02:49 Jandre joined #gluster
02:50 julim joined #gluster
02:53 Hamcube Ok, if all clients are disconnected and only .30 is up, gluster's CLI works. All other nodes must habe glusterd stopped.
02:57 Hamcube In fact, it can be any server. A long as there is only a single node up, the CLI is good. Once peers start to join it goes south.
02:58 Hamcube I assume that's because the rebalance is being attempted as soon as a new node joins and maybe that's what's causing things to hang
03:02 [7] joined #gluster
03:17 soumya joined #gluster
03:24 shubhendu__ joined #gluster
03:27 Micromus joined #gluster
03:27 johnmark joined #gluster
03:27 devilspgd joined #gluster
03:27 nixpanic joined #gluster
03:27 R0ok_ joined #gluster
03:27 tuxcrafter joined #gluster
03:27 saltsa joined #gluster
03:27 msciciel_ joined #gluster
03:27 csim joined #gluster
03:28 cuqa_ joined #gluster
03:28 klaas joined #gluster
03:28 rp_ joined #gluster
03:28 morse joined #gluster
03:28 sblanton joined #gluster
03:28 nixpanic joined #gluster
03:28 jbrooks joined #gluster
03:28 Marqin joined #gluster
03:28 T0aD joined #gluster
03:39 nbalacha joined #gluster
03:51 bharata-rao joined #gluster
03:53 atinmu joined #gluster
03:53 nbalacha joined #gluster
03:59 TheSeven joined #gluster
04:07 itisravi joined #gluster
04:12 vimal joined #gluster
04:18 yazhini joined #gluster
04:22 ppai joined #gluster
04:26 kdhananjay joined #gluster
04:28 RameshN joined #gluster
04:30 spandit joined #gluster
04:36 vimal joined #gluster
04:45 kanagaraj joined #gluster
04:45 WildyLion joined #gluster
04:47 sripathi joined #gluster
04:49 zeittunnel joined #gluster
04:52 gem joined #gluster
04:57 jiffin joined #gluster
05:01 sakshi joined #gluster
05:04 hgowtham joined #gluster
05:04 RameshN joined #gluster
05:11 rafi joined #gluster
05:16 poornimag joined #gluster
05:24 Manikandan joined #gluster
05:25 Jandre joined #gluster
05:25 dusmant joined #gluster
05:26 arcolife joined #gluster
05:28 kshlm joined #gluster
05:30 tessier glusterd is definitely running...I checked and there is no firewall. Is there a particular tcp port I can check for connectivity to?
05:34 hagarth joined #gluster
05:35 ashiq joined #gluster
05:38 David_H__ joined #gluster
05:42 pppp joined #gluster
05:43 deepakcs joined #gluster
05:46 zeittunnel joined #gluster
05:53 glusterbot News from newglusterbugs: [Bug 1217722] Tracker bug for Logging framework expansion. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1217722>
05:53 raghu` joined #gluster
05:55 hchiramm joined #gluster
05:58 schandra joined #gluster
06:01 maveric_amitc_ joined #gluster
06:03 rjoseph joined #gluster
06:03 atalur joined #gluster
06:05 overclk joined #gluster
06:07 atinmu tessier, its 24007
06:11 spalai joined #gluster
06:12 karnan joined #gluster
06:24 coredump joined #gluster
06:27 smohan joined #gluster
06:33 asriram joined #gluster
06:36 schandra_ joined #gluster
06:39 anil joined #gluster
06:46 smohan_ joined #gluster
06:53 anrao joined #gluster
06:59 anrao joined #gluster
07:01 [Enrico] joined #gluster
07:05 nangthang joined #gluster
07:14 spalai left #gluster
07:20 Trefex joined #gluster
07:23 glusterbot News from newglusterbugs: [Bug 1229127] afr: Correction to self-heal-daemon documentation <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229127>
07:29 ninkotech joined #gluster
07:29 ninkotech_ joined #gluster
07:33 soumya joined #gluster
07:36 PaulCuzner joined #gluster
07:37 Slashman joined #gluster
07:38 LebedevRI joined #gluster
07:38 Philambdo joined #gluster
07:43 asriram joined #gluster
07:53 glusterbot News from newglusterbugs: [Bug 1229139] glusterd: glusterd crashing if you run  re-balance and vol status  command parallely. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229139>
08:01 liquidat joined #gluster
08:02 DV joined #gluster
08:11 PaulCuzner left #gluster
08:12 PaulCuzner joined #gluster
08:13 PaulCuzner left #gluster
08:21 ctria joined #gluster
08:21 dusmant joined #gluster
08:21 owlbot joined #gluster
08:27 c0m0 joined #gluster
08:34 DV joined #gluster
08:38 SOLDIERz joined #gluster
08:39 danny__ joined #gluster
08:41 al joined #gluster
08:44 asriram joined #gluster
08:49 dusmant joined #gluster
08:55 _shaps_ joined #gluster
09:09 argonius mornin
09:17 autoditac joined #gluster
09:18 schandra-remote joined #gluster
09:19 mbukatov joined #gluster
09:20 nbalacha joined #gluster
09:20 nbalacha joined #gluster
09:25 hgowtham joined #gluster
09:26 asriram joined #gluster
09:33 dusmant joined #gluster
09:38 kshlm joined #gluster
09:46 dusmant joined #gluster
09:48 kbyrne joined #gluster
10:06 suliba joined #gluster
10:12 arcolife joined #gluster
10:19 jcastill1 joined #gluster
10:24 glusterbot News from newglusterbugs: [Bug 1229228] Ubuntu launchpad PPA packages outdated <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229228>
10:24 glusterbot News from newglusterbugs: [Bug 1229226] Gluster split-brain not logged and data integrity not enforced <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229226>
10:24 jcastillo joined #gluster
10:32 itisravi joined #gluster
10:34 glusterbot News from resolvedglusterbugs: [Bug 1220047] Data Tiering:3.7.0:data loss:detach-tier not flushing data to cold-tier <https://bugzilla.redhat.co​m/show_bug.cgi?id=1220047>
10:34 glusterbot News from resolvedglusterbugs: [Bug 1220052] Data Tiering:UI:changes required to CLI responses for attach and detach tier <https://bugzilla.redhat.co​m/show_bug.cgi?id=1220052>
10:34 glusterbot News from resolvedglusterbugs: [Bug 1206596] Data Tiering:Adding new bricks to a tiered volume(which defaults to cold tier) is messing or skewing up the dht hash ranges <https://bugzilla.redhat.co​m/show_bug.cgi?id=1206596>
10:34 glusterbot News from resolvedglusterbugs: [Bug 1221476] Data Tiering:rebalance fails on a tiered volume <https://bugzilla.redhat.co​m/show_bug.cgi?id=1221476>
10:34 glusterbot News from resolvedglusterbugs: [Bug 1206592] Data Tiering:Allow adding brick to hot tier too(or let user choose to add bricks to any tier of their wish) <https://bugzilla.redhat.co​m/show_bug.cgi?id=1206592>
10:41 arcolife joined #gluster
10:46 atinmu joined #gluster
10:50 kdhananjay joined #gluster
11:03 [Enrico] joined #gluster
11:03 [Enrico] joined #gluster
11:04 glusterbot News from resolvedglusterbugs: [Bug 1219848] Directories are missing on the mount point after attaching tier to distribute replicate volume. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1219848>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1221967] Do not allow detach-tier commands on a non tiered volume <https://bugzilla.redhat.co​m/show_bug.cgi?id=1221967>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1221969] tiering: use sperate log/socket/pid file for tiering <https://bugzilla.redhat.co​m/show_bug.cgi?id=1221969>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1221534] rebalance failed after attaching the tier to the volume. <https://bugzilla.redhat.co​m/show_bug.cgi?id=1221534>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1219048] Data Tiering:Enabling quota command fails with "quota command failed : Commit failed on localhost" <https://bugzilla.redhat.co​m/show_bug.cgi?id=1219048>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1219842] [RFE] Data Tiering:Need a way from CLI to identify hot and cold tier bricks easily <https://bugzilla.redhat.co​m/show_bug.cgi?id=1219842>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1219850] Data Tiering: attaching a tier with non supported replica count crashes glusterd on local host <https://bugzilla.redhat.co​m/show_bug.cgi?id=1219850>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1220050] Data Tiering:UI:when a user looks for detach-tier help, instead command seems to be getting executed <https://bugzilla.redhat.co​m/show_bug.cgi?id=1220050>
11:04 glusterbot News from resolvedglusterbugs: [Bug 1220051] Data Tiering: Volume inconsistency errors getting logged when attaching uneven(odd) number of hot bricks in hot tier(pure distribute tier layer) to a dist-rep volume <https://bugzilla.redhat.co​m/show_bug.cgi?id=1220051>
11:06 dusmant joined #gluster
11:08 firemanxbr joined #gluster
11:08 atinmu joined #gluster
11:12 rafi1 joined #gluster
11:16 WildyLion joined #gluster
11:19 keycto joined #gluster
11:22 rafi joined #gluster
11:31 kdhananjay joined #gluster
11:34 Sjors joined #gluster
11:36 Sjors joined #gluster
11:44 kkeithley1 joined #gluster
11:54 rgustafs joined #gluster
11:56 dusmant joined #gluster
11:58 zeittunnel joined #gluster
12:02 rafi1 joined #gluster
12:06 ppai joined #gluster
12:23 psilvao joined #gluster
12:28 bfoster joined #gluster
12:30 psilvao Hi
12:30 glusterbot psilvao: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
12:33 psilvao Fast Question: How i Can debug a gluster replica, I would like to know what I should examine the log, and what parameters is necesary to watch .., thanks in advance Pablo
12:35 chirino joined #gluster
12:35 rafi joined #gluster
12:37 jcastill1 joined #gluster
12:42 jcastillo joined #gluster
12:43 kanagaraj joined #gluster
12:46 wkf joined #gluster
12:48 Slashman joined #gluster
12:56 B21956 joined #gluster
12:56 harish joined #gluster
13:00 bennyturns joined #gluster
13:05 aaronott joined #gluster
13:06 Philambdo joined #gluster
13:07 theron joined #gluster
13:18 dusmant joined #gluster
13:21 hamiller joined #gluster
13:23 dgandhi joined #gluster
13:25 Prilly joined #gluster
13:25 plarsen joined #gluster
13:28 georgeh-LT2 joined #gluster
13:34 RameshN joined #gluster
13:36 julim joined #gluster
13:37 bene2 joined #gluster
13:42 kdhananjay joined #gluster
13:46 Hamcube joined #gluster
13:55 glusterbot News from newglusterbugs: [Bug 1229331] Disperse volume : glusterfs crashed <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229331>
13:57 bennyturns joined #gluster
14:01 arcolife joined #gluster
14:03 bene2 joined #gluster
14:07 c0m0 joined #gluster
14:13 wushudoin joined #gluster
14:19 cuqa_ joined #gluster
14:19 lpabon joined #gluster
14:20 dusmant joined #gluster
14:20 Hamcube I'm having a problem where all gluster CLI commands hang. This happened after I had attempted to rebalance the bricks.
14:25 ira joined #gluster
14:29 Hamcube CLI commands work if there is only one node in the cluster running, but as soon as I start glusterd on another machin, the CLI stops working.
14:32 DV__ joined #gluster
14:34 msvbhat Hamcube: Looks like your cluster is in inconsistent state. Does "gluster peer status" command work
14:44 anil joined #gluster
14:52 psilvao Dear people: the service self heal it's on by default? or its necessary by hand to do it?
14:55 Hamcube msvbhat: No, peer status hangs as well
14:55 DV joined #gluster
14:56 pppp joined #gluster
15:01 rwheeler joined #gluster
15:05 glusterbot News from resolvedglusterbugs: [Bug 1221747] DHT: lookup-unhashed feature breaks runtime compatibility with older client versions <https://bugzilla.redhat.co​m/show_bug.cgi?id=1221747>
15:13 hagarth joined #gluster
15:34 anrao joined #gluster
15:35 theron joined #gluster
15:36 theron joined #gluster
15:42 Leildin JoeJulian, Is there a reason why a centos with "grp-gluster1:/data on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_​permissions,allow_other,max_read=131072)" as a mount point still creates files with apache user rather than nobody, nobody ?
15:43 Leildin all samba access to volume complies to the nobody, nobody rule :/
15:54 nbalacha joined #gluster
15:56 pppp joined #gluster
16:07 jamesp joined #gluster
16:07 jamesp hey all, anyone at home?
16:08 coredump joined #gluster
16:10 jamesp guys, after some advice as to whether GlusterFS is right for me...
16:10 jamesp i need some distributed stoage (across 2 DC's) that we'll write the file into once, and read many times
16:11 jamesp ability to cache at the client side would be most useful so we don't have to keep reading from the gluster
16:11 jamesp but, it's a lot of files, a lot of them small (ish), small sound files .wav/.gsm formats
16:12 jamesp is glusterfs the right solution for me, or should i look at something else?
16:17 ashiq joined #gluster
16:17 Manikandan joined #gluster
16:18 rafi joined #gluster
16:19 cholcombe joined #gluster
16:22 Leildin joined #gluster
16:24 Hamcube joined #gluster
16:36 hgowtham joined #gluster
16:37 JoeJulian Leildin: There was once a filter translator, but that got left by the wayside. I don't think there's a way to do that currently.
16:38 JoeJulian jamesp: The geo-replication feature may be useful for that.
16:38 ashiq- joined #gluster
16:38 jamesp hi Joe
16:39 jamesp do you think the technology is suitable for what we're trying to do? I've read that there are concerns over small file sizes
16:39 jamesp but i couldn't gather whether that was for older releases...
16:39 jamesp also i couldn't find much info on client side cacheing of data
16:40 JoeJulian If you're going to have a replicated volume within one datacenter, then there is the self-heal overhead, but if not and you just want to use it for distribution then it should be fine.
16:41 JoeJulian @php
16:41 glusterbot JoeJulian: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH --negative-timeout=HIGH
16:41 glusterbot JoeJulian: --fopen-keep-cache
16:42 jamesp i guess for total fault tolerance i'd ideally like 2 x GlusterFS servers in each DC
16:42 jamesp therefore a single server failure won't affect local operations
16:42 jamesp i don't expect high writes,
16:42 jamesp more read
16:43 JoeJulian Right, so now you're getting the fault tolerance at the cost of performance. Check out that blog article.
16:55 jamesp does the client cache?
16:55 jamesp then i don't have a performance issue because the actual reads on GlusterFS are very low too, it's served locally
16:57 JoeJulian If those timeouts are set to a high enough number to satisfy your needs.
16:58 JoeJulian My recommendation is always to determine your needs, then engineer a system that meets them.
16:59 soumya joined #gluster
16:59 jiffin joined #gluster
17:00 JoeJulian There's no hard and fast rules to it. Install gluster, see how it works, see where your bottlenecks are and if they're unacceptable, work on solving them or working around them. (imho)
17:03 Rapture joined #gluster
17:04 ashiq__ joined #gluster
17:04 jamesp ok thanks Joe,
17:04 jamesp any other technology i should consider?
17:05 JoeJulian rsync?
17:05 JoeJulian Putting all your media on a CDN?
17:05 rveroy joined #gluster
17:10 CyrilPeponnet Hey guys, I still have CPU spike for glusterfs process holding nfs which make clients hang every 10s
17:10 CyrilPeponnet I try to narrow down what is going on
17:10 CyrilPeponnet any help will be apreciated
17:14 jamesp hmm, no not rsync
17:14 jamesp that doesn't help at all really
17:18 RameshN joined #gluster
17:25 glusterbot News from newglusterbugs: [Bug 1229422] server_lookup_cbk erros on LOOKUP only when quota is enabled <https://bugzilla.redhat.co​m/show_bug.cgi?id=1229422>
17:29 mcpierce joined #gluster
17:31 jobewan joined #gluster
17:33 haomaiwang joined #gluster
17:36 ashiq__ joined #gluster
17:37 sage joined #gluster
17:38 CyrilPeponnet @anyone ? The whole system is barely usable through nfs
17:47 uebera|| CyrilPeponnet: Can you provide some more details (e.g., which version are you using, did you recently upgrade, ...)?
17:47 CyrilPeponnet from what I can see, when the CPU spike occurs, the glusterfs.rmtab file is accessed a lot
17:47 CyrilPeponnet sure 3.5.2 no recent upgrade
17:47 CyrilPeponnet 3 nodes, on vol in replica 3
17:48 mcpierce joined #gluster
17:48 CyrilPeponnet actually if I reboot the node, It will be better for several days then start to hang again
17:49 haomaiwang joined #gluster
17:52 CyrilPeponnet @uebera|| I have even disable the logs as it was writing way too much but no luck. Only nfs clients are hanging not the gfs clients
17:55 uebera|| CyrilPeponnet: Could it be related to this? --> https://bugzilla.redhat.co​m/show_bug.cgi?id=1166862
17:55 glusterbot Bug 1166862: urgent, unspecified, ---, ndevos, NEW , rmtab file is a bottleneck when lot of clients are accessing a volume through NFS
17:56 CyrilPeponnet @uebera|| I open this issue a while ago
17:56 uebera|| Ah, so it *is* related. :)
17:56 CyrilPeponnet It could be but I already use the /dev/shm
17:57 CyrilPeponnet well for a long time We didn't have this issue (minimised) but now it occurs again
17:57 CyrilPeponnet maybe we reached a threshold by adding more and more clients
17:59 JoeJulian CyrilPeponnet: Is there any way of putting rmtab on a tmpfs?
17:59 CyrilPeponnet @JoeJulian already the case
18:00 CyrilPeponnet /dev/shm/glusterfs.rmtab (workaround I suggested)
18:00 JoeJulian Does that work? maybe symlink the existing rmtab there?
18:13 psilvao Dear people.. How Can I debug a gluster replica, I would like to know what I should examine the log, and what parameters is necesary to watch .., thanks in advance Pablo
18:15 JoeJulian I would look for errors or critical errors in logs (" [EC] "). I look at ,,(extended attributes) on the files on the bricks. I check connectivity, permissions (selinux).
18:15 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://pl.atyp.us/hekafs.org/index.php/​2011/04/glusterfs-extended-attributes/
18:15 Rapture joined #gluster
18:16 JoeJulian But feel free to just ask questions here. If you're having a problem, describe it. We all hang out here to learn and help.
18:17 psilvao Joe: thanks the command would be --> grep -r "[EC]" /var/log/glusterfs/*
18:17 JoeJulian grep -r "[EC]" /var/log/glusterfs/* /var/log/glusterfs/bricks/*
18:18 JoeJulian because there are logs everywhere. I also recommend a good log aggregation tool like logstash.
18:21 jcastill1 joined #gluster
18:27 jcastillo joined #gluster
18:30 psilvao Thanks Joe
18:45 hchiramm joined #gluster
18:50 Trefex joined #gluster
18:52 cholcombe JoeJulian, it looks like when you add-brick to a volume that it knows what the replica factor is and you don't need to specify it.  is that true?
18:52 JoeJulian correct
18:52 cholcombe awesome :)
18:52 JoeJulian You only need to specify replica (or stripe) if you'
18:52 JoeJulian you're changing it.
18:52 cholcombe i see
18:53 cholcombe i don't think i will be
18:56 anil joined #gluster
18:58 cholcombe it looks like the formula used for the Number of Bricks: in volume info is Distribute x Replica x Stripe = Total
19:02 JoeJulian correct
19:02 psilvao JoeJulian: other question the self service heal it's on by default when you create the replica with 2 briks? or it's necessary to enable it?
19:03 JoeJulian It's on be default.
19:06 cyberbootje1 joined #gluster
19:06 rotbeard joined #gluster
19:11 VeggieMeat_ joined #gluster
19:12 malevolent_ joined #gluster
19:14 psilvao joejulian: joejulian: We have a solution for presenting data gluster by Apache, what we have observed, when one sees the apache DirectoryIndex A, can see all the files, but if you want to access the Apache B for a file, Apache B indicates that there despite what you see from the apache, this is a typical behavior of gluster or could be a bug?, This architecture has two bricks in reply.
19:15 CyrilPeponnet @JoeJulian sorry for the late answer was in meeting. but yes relocated the file to /dev/shm helps for a while but now I'm wondering (because I have more and more clients), that's became a bottle neck
19:16 CyrilPeponnet I have almost the same amount of nfs connexion on each nodes but we are using a VIP (active/passive) so lot of new connecion / mount / umount happens on this single node and this node experience glisters spike like 100% of 16 cores for few seconds from time to time. during this time the rmtab file seems to be accessed and modified.
19:16 CyrilPeponnet (there is an option du relocate the file)
19:18 Trefex1 joined #gluster
19:19 CyrilPeponnet so Im looking for a way to disable this feature
19:19 Trefex1 joined #gluster
19:23 Trefex joined #gluster
19:24 ndevos CyrilPeponnet: if your servers are on 3.7, you can set the option to /- to disable the rmtab, see http://review.gluster.org/10379
19:24 CyrilPeponnet 3.5.2
19:24 Trefex joined #gluster
19:24 * ndevos was expecting that
19:24 CyrilPeponnet :p
19:24 JoeJulian CyrilPeponnet: Facebook assigns 4 floating IPs per server and uses round robin dns to resolve the client to a nfs server. If a host goes down, those 4 ips are distributed to 4 different servers. This allows failure without overwhelming a single server.
19:25 ndevos CyrilPeponnet: I could backport it to 3.6 and 3.5, but you would need to update the servers when 3.5.5 gets out
19:25 CyrilPeponnet @JoeJulian Yes I'm looking that approach too, any LB recommendations ?
19:26 JoeJulian I prefer a simple rrdns without an LB. Fewer moving parts.
19:26 CyrilPeponnet @ndevos: As long as the update process can be done live, I'm ok with that
19:26 CyrilPeponnet @JoeJulian me too, I will check with our DNS IT if they can offer such a service
19:27 ndevos CyrilPeponnet: I have not heard of any issues when updating the 3.5.x versions to 3.5.y
19:27 JoeJulian CyrilPeponnet: it's kind-of built in. All it takes is multiple A records for a single hostname.
19:27 CyrilPeponnet @JoeJulian can you explain more how there 4 floating IP works ?
19:27 ndevos 3.5.4 -> 3.5.5 would then have 2 fixes, I think, so the difference is extremely small
19:28 JoeJulian ndevos: Where's the FB slides?
19:28 ndevos JoeJulian: I dont think we have them :-/ maybe spot knows
19:28 CyrilPeponnet @JoeJulian ok you must provide the same weight for al of them I guess
19:28 JoeJulian yes
19:29 ndevos CyrilPeponnet: ctdb supports such configurations out of the box, but I have no idea about the details
19:29 CyrilPeponnet @ndevos if you back port this change to 3.5.5 when this release is expected ?
19:29 CyrilPeponnet and by the way, what is the current *stable supported* release on centos7
19:30 ndevos CyrilPeponnet: what would be a reasonable timeframe for you?
19:30 CyrilPeponnet tomorrow?
19:30 CyrilPeponnet :p
19:30 CyrilPeponnet well by the end of the month
19:30 ndevos 3.5.x will stick around for a while, until 3.8 or 4.0 gets released
19:30 JoeJulian Hehe, he's nicer than I am. My answer is always "yesterday".
19:31 CyrilPeponnet :pp
19:31 CyrilPeponnet You're kind enough to answer my question so I try to be nice :)
19:32 ndevos CyrilPeponnet: would you be able to write a blog post or something about your gluster environment and use-case?
19:32 CyrilPeponnet @ndevos for now we are using keepalived as a VIP machanism.
19:32 * ndevos likes pacemaker more
19:33 CyrilPeponnet @ndevos sure James asked me the same few month ago
19:33 ndevos CyrilPeponnet: maybe I could do the backports sooner then ;-)
19:33 CyrilPeponnet I mean we could do that to explain what we want to achieve (with more or less success)
19:34 CyrilPeponnet So to sum up.
19:34 CyrilPeponnet 1/ I should get rid of rmtab feature in the next release and update our infra.
19:34 ndevos sure, whatever you can share and shows a workable use-case :)
19:35 CyrilPeponnet 2/ RR the load across several nodes is preferred if I use nfs
19:35 CyrilPeponnet for now I started to migrate from nfs to full gfs.
19:36 CyrilPeponnet but I can't use gfs for homedir as it doesn't handle the sub mounting (or I can do nasty things with --bind as a workaround)
19:37 CyrilPeponnet @ndevos I can definitely start to work on that laster but for now I have infra to fix :/
19:38 ndevos CyrilPeponnet: anything you could put together would be appreciated, I'll try to slip the backports in this or next week
19:39 CyrilPeponnet and by the way is ~1500 nfs/gfs clients too much for a 3 nodes setup ? mostly read usage
19:39 ndevos "too much" really depends on your expectations and requirements
19:39 CyrilPeponnet disk and network are fine,
19:40 CyrilPeponnet well expectation is reliable without hang :)
19:40 * CyrilPeponnet looking at CTDB
19:40 JoeJulian I would lean toward it not being too much.
19:40 ndevos you only have the hang while you mount, right?
19:40 CyrilPeponnet @ndevos no. when accessing files
19:41 CyrilPeponnet only from the node 2 holding the vip. If I mount the vol using node1 it's fine
19:41 ndevos let me phrase that differently, "when a or more clients execute a (un)mount, performance of other clients is affected"
19:42 CyrilPeponnet that's my guess as we are using autofs and this sh** is mounting / mounting on demand
19:42 CyrilPeponnet *unmounting
19:42 ndevos I guess that you use automounting? meaning mounts are done on demand, and expired after 5 (?) minutes by default
19:42 CyrilPeponnet yep
19:43 ndevos maybe it already helps to increase the timeout that autofs has for expiring mount points?
19:43 ndevos fewer unmounts and therefore fewer mounts needed?
19:43 CyrilPeponnet @ndevos today it's in auto.direct with etc/auto.direct --ghost --timeout=60
19:43 CyrilPeponnet it makes sens
19:44 ndevos wow, 60 seconds?
19:44 lexi2 joined #gluster
19:44 CyrilPeponnet dunno, default configuration let me check that
19:44 CyrilPeponnet yes seconds
19:45 ndevos "man 5 auto.master" --timeout <seconds>
19:45 CyrilPeponnet actually I started to migrate from autofs for those mounts to fstab glusterfs mounts
19:45 CyrilPeponnet as they have to be always up
19:46 ndevos so, maybe increasing the timeout helps? you would need to verify that with your access patterns
19:46 CyrilPeponnet only home dirs will use autofs
19:46 CyrilPeponnet how can I check the access patterns
19:46 CyrilPeponnet I try to do that since this morning
19:47 ndevos I guess /var/log/messages contains autofs entries about mounting and expiring?
19:47 CyrilPeponnet oh
19:47 CyrilPeponnet yes
19:47 CyrilPeponnet let me check thta
19:48 CyrilPeponnet nope
19:48 CyrilPeponnet well
19:49 CyrilPeponnet I can check the content of rmtab file
19:49 CyrilPeponnet each seconds
19:49 harish joined #gluster
20:01 CyrilPeponnet @ndevos ok I will update our puppet manifest first to set 3600 insteand of 60 as timeout will see if it helps
20:27 nsoffer joined #gluster
20:28 stickyboy joined #gluster
20:34 jcastill1 joined #gluster
20:38 psilvao Dear People: Infinity Band (RDMA protocols ) for gluster it's a best option comparate with tcp?
20:39 jcastillo joined #gluster
20:44 Trefex1 joined #gluster
21:05 wushudoin| joined #gluster
21:07 badone_ joined #gluster
21:12 wushudoin| joined #gluster
21:13 jbautista- joined #gluster
21:25 Gill joined #gluster
21:27 jbautista- joined #gluster
21:38 wkf joined #gluster
21:38 CyrilPeponnet @ndevos Well I append --timeout=3600 to all of the clients we have here but it doesn't help to much
21:39 CyrilPeponnet I have ~500 tcp connexion to that node, and when 10 are added / removed I got a spike to 1600%CPU (making hanging all the others)
21:47 JoeJulian psilvao: infiniband is very fast. 40Gbps and up. It has very low latency so it's better for high iops. With rdma it avoids two context switches saving even more time per iop.
21:58 wushudoin| joined #gluster
22:03 premera joined #gluster
22:03 wushudoin| joined #gluster
22:27 diegows joined #gluster
22:31 n-st joined #gluster
22:57 CyrilPeponnet @JoeJulian Do you think I can set rmtab to /dev/null ?
23:05 Jandre joined #gluster
23:05 Gill_ joined #gluster
23:31 JoeJulian CyrilPeponnet: ndevos would know better than I. I'd have to check the code.
23:32 CyrilPeponnet I made a test on another setup sounds armless
23:33 CyrilPeponnet from what I can see here http://review.gluster.org/#/c/4430/ it just read and write to it
23:44 plarsen joined #gluster
23:55 Rapture joined #gluster
23:57 gildub joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary