Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-02-10

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 cpetersen hehe
00:01 cpetersen Whelp, time to spin up openstack on Centos 7 at home.
00:07 JoeJulian If you really want to be cool, you run from git master on archlinux at home like I do... ;)
00:07 cpetersen Man...
00:07 cpetersen You're blowing my mind.
00:07 JoeJulian hehe
00:19 mowntan joined #gluster
00:19 mowntan joined #gluster
00:19 mowntan joined #gluster
00:29 calavera joined #gluster
00:33 theron joined #gluster
00:37 vmallika joined #gluster
00:38 cuqa hello, during upgrade of gluster is it normal that I cannot use the volume with the new version appropriately?
00:41 wiza joined #gluster
01:03 Champi joined #gluster
01:06 gildub joined #gluster
01:12 nickage__ joined #gluster
01:23 johnmilton joined #gluster
01:23 johnmilton greetings
01:24 johnmilton i have a dilemma
01:25 johnmilton i have a 2x2 gluster volume that was recently rebalanced after a node went down. one brick's capacity isn't reflecting the correct usage. it doesn't appear to be replicated from the other node (the other set of bricks replicated fine)
01:44 sonicrose joined #gluster
01:45 sonicrose hi all...  i came up with 1 more question (for now)  ...   Is it possible to take an existing volume that is just 1 brick (non-replicated) and make it into a replicated volume, by adding another brick(s) ?
01:48 johnmilton yes
01:49 sonicrose johnmilton: thanks...   do I just do add-brick <vol> replica 2 <new brick1> <new brick2> ?
01:51 sonicrose i guess i should be as clear as possible, as I have 25TB i cannot lose...     its all on a single brick right now, and I want to decomission the current server after moving all the data onto 3 new servers (with 2 bricks each), so it should end up a "Distribute 3/Replica 2"  and hoping to keep the volume online while the data is moving over
01:52 johnmilton how many nodes do you have now?
01:53 CyrilPeponnet maybe the safest is to create a separate cluster then sending the data from a mountpoint to another
01:54 sonicrose johnmilton: initially it is just 1 node, 1 brick, 1 volume
01:54 sonicrose however i just did peer probe on 3 new nodes that each has 2 bricks available, that's as far as i've gotten
01:54 CyrilPeponnet and you want to go to 3 nodes, 2 bricks each, 1 vol
01:55 sonicrose CyrilPeponnet: that is correct
01:55 sonicrose i have all 4 nodes currently in the cluster
01:56 CyrilPeponnet What *I* will do is create a separate cluster (maybe time to update) and sync the data from a mount point to another. This way you are 100% no to loose data.
01:57 sonicrose CyrilPeponnet: that was going to be my "plan B" if it i can't do the transfer online
01:57 CyrilPeponnet I mean maybe not the *best* solution but I think it's the safest one.
01:58 sonicrose ive never had issues doing add-brick  remove-brick before it always worked fine, but i never converted a Distribute volume into Distribute-Replica before
01:58 CyrilPeponnet that's the point. Adding brick is fine, but migrating from replicated to D-R... :/
01:59 sonicrose yes, this is why I've come to #gluster :D  i was hoping for some feedback from someone who has done the same before
02:00 shyam joined #gluster
02:12 nishanth joined #gluster
02:15 nickage_ joined #gluster
02:36 glaif joined #gluster
02:41 wiza joined #gluster
02:46 valkyr1e joined #gluster
02:46 valkyr1e hello, I am testing GlusterFS with 2 servers, replicated one another in a trusted pool
02:46 valkyr1e a 3rd machine, a client is connecting to one of the 2 servers to mount the disk
02:46 valkyr1e but when I shut down the other server, the client will hang on 'ls /shared_dir' and 'df -f' for sometime
02:46 valkyr1e before it's back to normal
02:46 valkyr1e my shared_dir is mounted at /mnt/tmp and here is what sudo strace ls /mnt/tmp stuck at
02:46 valkyr1e stat64("/mnt/tmp",
02:46 valkyr1e for 1 minutes or so, then it will continue and return data
02:46 valkyr1e any ideas?
02:46 lanning you sure it isn't 42 seconds?
02:48 valkyr1e haven't 'time' that
02:48 valkyr1e but could be
02:48 valkyr1e ah, https://thornelabs.net/2015/02/24/change-gluster-volume-connection-timeout-for-glusterfs-native-client.html
02:48 glusterbot Title: Change Gluster Volume Connection Timeout for GlusterFS Native Client (at thornelabs.net)
02:49 valkyr1e I googled it and read the doc but could find the info
02:49 valkyr1e 42s returned some info
02:49 valkyr1e thanks, I'll keep reading
02:50 lanning brick timeout defaults to 42 seconds
02:50 valkyr1e I see
02:50 lanning waiting for possible recovery
02:50 valkyr1e what would be the best pratice for GlusterFS?
02:51 valkyr1e any potential issues if I reduce the timeout to, say, 2-5 seconds or so?
02:51 lanning depends on requirements. there are pros and cons for changing it in either direction.
02:51 valkyr1e quite aggressive I assume
02:52 lanning You will get bricks marked down and re-establishing all file descriptors and locks is a very expensive operation.
02:54 lanning @pingtimeout
02:54 glusterbot lanning: I do not know about 'pingtimeout', but I do know about these similar topics: 'ping-timeout'
02:54 lanning @ping-timeout
02:54 glusterbot lanning: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
02:54 valkyr1e make sense
02:55 valkyr1e thank you, I'll continue reading
03:05 vmallika joined #gluster
03:06 shyam joined #gluster
03:07 JoeJulian CyrilPeponnet: "This way you are 100% no to loose data." I do hate when my data gets loose. ;)
03:13 CyrilPeponnet :p
03:22 JoeJulian @forget ping-timeout
03:22 glusterbot JoeJulian: The operation succeeded.
03:24 JoeJulian @learn ping-timeout as  The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. With an average MTBF of 45000 hours for a server, even just a replica 2 would result in a 42 second MTTR every 2.6 years, or 6 nines of uptime.
03:24 glusterbot JoeJulian: The operation succeeded.
03:25 lanning that depends on the size of the cluster... :P
03:38 jhc76 Is it possible to do rolling upgrades with gluster? when gluster3.8 comes out, can it do upgrades on 1 node at a time without bringing down the service?
03:41 gem joined #gluster
03:43 atinm joined #gluster
03:43 jhc76 i found this but I'm scared of that phrase "...feeling adventurous" https://vbellur.wordpress.com/2013/07/15/upgrading-to-glusterfs-3-4/ i never been adventurous with massive data. I never will be.
03:48 bharata-rao joined #gluster
03:54 mowntan joined #gluster
03:54 mowntan joined #gluster
03:55 nehar joined #gluster
03:56 ramteid joined #gluster
04:01 shyam1 joined #gluster
04:08 itisravi joined #gluster
04:10 itisravi joined #gluster
04:11 shubhendu joined #gluster
04:12 calavera joined #gluster
04:15 coredump joined #gluster
04:19 kdhananjay joined #gluster
04:21 calavera_ joined #gluster
04:32 tswartz joined #gluster
04:32 calavera joined #gluster
04:35 nbalacha joined #gluster
04:41 calavera joined #gluster
04:42 JoeJulian lanning: Not unless you're doing stripe or disperse. With a replica 2 distribute volume if you lose a server that's not in use, I'm pretty sure you never notice the ping timeout, so only the peers in the replica set you're actively using should cause that.
04:42 JoeJulian jhc76: I'm doing one now from 3.4 to 3.6.
04:43 JoeJulian 332TB.
04:52 calavera joined #gluster
04:56 Manikandan joined #gluster
04:56 kanagaraj joined #gluster
04:57 nehar joined #gluster
04:59 rafi joined #gluster
05:08 ramky joined #gluster
05:09 calavera_ joined #gluster
05:10 gowtham joined #gluster
05:11 aravindavk joined #gluster
05:14 jiffin joined #gluster
05:17 calavera joined #gluster
05:19 ndarshan joined #gluster
05:19 ramky joined #gluster
05:26 RameshN joined #gluster
05:28 rcampbel3 joined #gluster
05:30 calavera joined #gluster
05:35 Apeksha joined #gluster
05:39 SOLDIERz joined #gluster
05:40 skoduri joined #gluster
05:46 suliba joined #gluster
05:47 pppp joined #gluster
05:48 karthikfff joined #gluster
05:48 hgowtham joined #gluster
05:50 theron joined #gluster
05:51 karnan joined #gluster
05:51 Bhaskarakiran joined #gluster
05:54 kotreshhr joined #gluster
06:03 kshlm joined #gluster
06:06 dusmantkp_ joined #gluster
06:08 ashiq joined #gluster
06:09 nishanth joined #gluster
06:10 vimal joined #gluster
06:10 anil joined #gluster
06:11 ppai joined #gluster
06:16 pdrakeweb joined #gluster
06:28 CyrilPeponnet @JoeJulian Finger crossed :p
06:29 CyrilPeponnet Dont let your data get loose...
06:34 Saravanakmr joined #gluster
06:35 atalur joined #gluster
06:38 calavera joined #gluster
06:50 gildub joined #gluster
06:56 karnan joined #gluster
06:59 unlaudable joined #gluster
07:02 jtux joined #gluster
07:17 mhulsman joined #gluster
07:18 mhulsman joined #gluster
07:19 karnan joined #gluster
07:26 anrao joined #gluster
07:37 rcampbel3 joined #gluster
07:38 [Enrico] joined #gluster
07:51 jwd joined #gluster
07:53 kovshenin joined #gluster
08:03 Simmo joined #gluster
08:05 mobaer joined #gluster
08:06 abyss joined #gluster
08:06 Simmo left #gluster
08:15 valkyr1e hi I wonder if I can do access control per client/mount points with glusterfs?
08:16 valkyr1e say, a specific client can only access and mount a certains volume
08:17 valkyr1e like I have api-log volume that is only shared between api servers and db-log that is only shared between database servers
08:17 vmallika joined #gluster
08:20 toshywoshy joined #gluster
08:21 dlambrig_ joined #gluster
08:22 dlambrig_ left #gluster
08:25 fsimonce joined #gluster
08:32 SOLDIERz joined #gluster
08:38 ctria joined #gluster
08:47 dusmantkp_ joined #gluster
08:55 ahino joined #gluster
08:56 jwd access control might be the keyword?
08:59 jwd gluster volume set testvol auth.allow 192.168.0.101
09:01 kdhananjay joined #gluster
09:01 Manikandan_ joined #gluster
09:02 Akee joined #gluster
09:03 itisravi joined #gluster
09:04 jiffin joined #gluster
09:09 JoeJulian valkyr1e: Ip address as jwd suggests or identity using ssl. http://gluster.readthedocs.org/en/latest/Administrator%20Guide/SSL/?highlight=ssl
09:09 glusterbot Title: SSL - Gluster Docs (at gluster.readthedocs.org)
09:15 arcolife joined #gluster
09:15 muneerse joined #gluster
09:21 atinm joined #gluster
09:23 ppai joined #gluster
09:23 atalur joined #gluster
09:25 jwd anyone got a clue what this means: RPC procedure 2 not available for Program GF-DUMP
09:32 Manikandan joined #gluster
09:34 Kins joined #gluster
09:44 hgowtham_ joined #gluster
09:47 kovshenin left #gluster
09:52 fedele joined #gluster
09:52 gem joined #gluster
09:58 fedele Ciao #gluster
10:02 fedele I have created this volume on my cluster of 32 nodes: gluster volume info output in http://termbin.com/xsjp
10:03 fedele mounted the volume (distributed volume)
10:03 fedele mount -t glusterfs -o transport=rdma ib-wn001:/scratch /scratch1
10:03 fedele and made a simple test
10:04 fedele [root@wn001 scratch1]# dd if=/dev/zero of=test bs=1G count=1 oflag=dsync
10:04 fedele 1+0 records in
10:04 fedele 1+0 records out
10:04 fedele 1073741824 bytes (1,1 GB) copied, 264,655 s, 4,1 MB/s
10:04 fedele I mounted /scratch1 on ib-wn001
10:04 aravindavk joined #gluster
10:05 hgowtham_ joined #gluster
10:05 fedele Can you help me?
10:05 fedele I suspect this is not good performances
10:06 post-factum fedele: dd is considered not the best test for gluster performance measurement
10:06 post-factum s/considered/considered to be/
10:06 glusterbot What post-factum meant to say was: fedele: dd is considered to be not the best test for gluster performance measurement
10:06 bhuddah have you tried iozone or bonnie++?
10:06 glusterbot bhuddah: bonnie's karma is now 9
10:06 bhuddah .,.
10:06 post-factum :D
10:07 bhuddah glusterbot: i don't like you today.
10:07 post-factum glusterbot--
10:07 glusterbot post-factum: glusterbot's karma is now 7
10:07 fedele sure First of all I try with bonnie: same results
10:08 arcolife joined #gluster
10:08 bhuddah can you share the results?
10:15 dlambrig_ joined #gluster
10:16 fedele This is the output of bonnie: http://termbin.com/52sd
10:17 Manikandan joined #gluster
10:20 Slashman joined #gluster
10:21 bhuddah that's a little disappointing.
10:26 fedele This is the output of gluster volume status http://termbin.com/tfqq
10:27 bhuddah are you using the nfs server?
10:32 fedele I mounted the volume on ib-wn001 with this command:
10:32 dusmantkp_ joined #gluster
10:32 fedele mount -t glusterfs -o transport=rdma ib-wn001:/scratch /scratch1
10:34 ppai joined #gluster
10:34 poornimag joined #gluster
10:35 Manikandan joined #gluster
10:35 mhulsman1 joined #gluster
10:35 fedele I suppose I don't use NFS server because the output of volume status shows RDMA port 0 for each NFS server
10:36 bhuddah probably.
10:37 fedele bhuddah: I created a distributed volume.
10:37 kdhananjay joined #gluster
10:37 bhuddah i know i know.
10:38 fedele Bhuddah: Do you think I made some mistakes?
10:39 bhuddah i don't know. sorry. have you tried benchmarking a striped volume?
10:39 fedele I will try now
10:41 bhuddah good luck
10:43 fedele bhuddah: a question: in the output of bonnie It seems that I have in Sequential Output 101 MB/sec per Block, is this correct?
10:44 aravindavk joined #gluster
10:45 bhuddah it looks like it, yes.
10:46 fedele I have made the some test on the standalone volume and this is the performances... so I suppose performances are poor because of distributed configuration.
10:46 fedele Is ir correct?
10:46 shubhendu joined #gluster
10:47 bhuddah i don't think you have collected enough data to support that yet.
10:47 fedele Can you explain?
10:48 bhuddah you've only gathered one data point now. it's far too early to say what causes your problems.
10:51 nishanth joined #gluster
10:51 kshlm joined #gluster
10:52 fedele I would add to the informations that when bonnie was running The test files were created on a remote node (as we espect)
10:52 [Enrico] joined #gluster
10:52 bhuddah well. okay. at least that works.
10:52 atinm joined #gluster
10:53 fedele Now I'm creating a totally striped volume (32 stripes)
10:54 bhuddah i can't wait to see the results!
10:54 Manikandan joined #gluster
10:54 fedele ok, I'll give results when will be available.
10:57 fedele Only a question: a volume with 32 striped disks Is it a feasible configuration?
10:57 bhuddah sorry, i have no experience with a deployment of that size.
10:58 fedele ok, so I will try. Good bye
10:59 ppai joined #gluster
11:06 amye joined #gluster
11:07 Bhaskarakiran joined #gluster
11:08 abyss^ joined #gluster
11:14 Bhaskarakiran_ joined #gluster
11:14 fedele bhuddah: perfo results didn't change if I write a single file: I suppose this is due because the entire file is striped on a brick in the volume
11:15 fedele Can be better performances If I write 32 files on the volume, I suppose
11:17 bhuddah i don't think this should be the case with striping.
11:20 fedele ok, what I write concerns distributed volume.
11:21 fedele I made dd test that give me 4.5 MB/sec.... Now I'm tring bonnie
11:24 fedele I would test the NUFA Translator, but command gluster volume set myvolume cluster.nufa on seems not working.
11:30 hgowtham_ joined #gluster
11:31 poornimag joined #gluster
11:32 Wizek joined #gluster
11:43 jiffin1 joined #gluster
11:45 ira joined #gluster
11:47 itisravi joined #gluster
11:49 kshlm Weekly community meeting starts in 10 minutes on #gluster-meeting
11:52 jwd what is the suggested way to move from a 2 node gluster to more nodes
12:02 jdarcy joined #gluster
12:09 shubhendu joined #gluster
12:12 nishanth joined #gluster
12:13 dusmantkp_ joined #gluster
12:26 bluenemo joined #gluster
12:31 robb_nl joined #gluster
12:31 dusmantkp_ joined #gluster
12:39 johnmilton joined #gluster
12:42 chirino_m joined #gluster
12:57 ro_ joined #gluster
12:59 dlambrig_ joined #gluster
13:00 ro_ Hey guys, I have two bricks that are in split brain for the entire root directory of the volume. I've seen lots of articles on resolving this for individual files but can't seem to find any advice on fixing an entire directory. Any insight on how to do this?
13:10 haomaiwang joined #gluster
13:12 RameshN joined #gluster
13:18 kdhananjay joined #gluster
13:20 RameshN_ joined #gluster
13:20 poornimag joined #gluster
13:30 jdarcy joined #gluster
13:35 kshlm joined #gluster
13:35 johnmilton hello, where can i find this documentation...it seems to no longer exist:  http://community.gluster.org/a/howto-targeted-self-heal-repairing-less-than-the-whole-volume/
13:36 theron joined #gluster
13:36 johnmilton or, can someone explain the process (I'm just trying to heal one brick in a distributed/replicated volume across two nodes)
13:39 unclemarc joined #gluster
13:40 B21956 joined #gluster
13:42 post-factum glusterbot: heal
13:42 glusterbot post-factum: I do not know about 'heal', but I do know about these similar topics: 'heal-failed', 'targeted self heal'
13:43 post-factum glusterbot: targeted self heal
13:43 glusterbot post-factum: https://web.archive.org/web/20130314122636/http://community.gluster.org/a/howto-targeted-self-heal-repairing-less-than-the-whole-volume/
13:43 post-factum see? :D
13:43 post-factum glusterbot++
13:43 glusterbot post-factum: glusterbot's karma is now 8
13:45 Peppard joined #gluster
13:45 B21956 joined #gluster
13:46 jwaibel joined #gluster
13:47 B21956 joined #gluster
13:47 nbalacha joined #gluster
13:50 jwd_ joined #gluster
13:50 poornimag joined #gluster
13:55 kaushal_ joined #gluster
13:59 RameshN__ joined #gluster
14:01 haomaiwang joined #gluster
14:07 theron joined #gluster
14:12 atalur joined #gluster
14:15 julim joined #gluster
14:18 theron joined #gluster
14:20 shaunm joined #gluster
14:24 shyam joined #gluster
14:26 jmarley joined #gluster
14:32 coredump joined #gluster
14:35 johnmilton post-factum: thanks a bunch
14:40 skylar joined #gluster
14:43 kotreshhr left #gluster
14:47 plarsen joined #gluster
14:49 RameshN__ joined #gluster
14:52 hamiller joined #gluster
14:55 valkyr1e so I understand the rationale behind tcp timeout at 42s
14:55 valkyr1e but the question is, is there any best pratice or known setup that could hit zero downtime?
14:56 valkyr1e say I have apps that constantly crawl data and wrote into glusterfa
14:56 valkyr1e now if one of my servers is down, all data crawled in that 42 seconds window will be lost
14:57 valkyr1e is there any known best pratice that can hit zero downtime with Glusterfs?
14:57 Akee joined #gluster
15:01 haomaiwang joined #gluster
15:07 Melamo joined #gluster
15:07 shyam joined #gluster
15:09 raghu joined #gluster
15:13 plarsen joined #gluster
15:14 ro_ joined #gluster
15:16 theron joined #gluster
15:18 mindscratch joined #gluster
15:22 jiffin joined #gluster
15:22 theron joined #gluster
15:26 aravindavk joined #gluster
15:33 bennyturns joined #gluster
15:37 tertiary joined #gluster
15:39 tertiary can anyone confirm that glusterfs 3.7 on Ubuntu 14.04 has a bug where it will fail to automount the volume because the network is net yet available? i have seen this in google searches for previous versions....
15:39 Akee joined #gluster
15:40 calavera joined #gluster
15:41 calavera joined #gluster
15:42 RameshN_ joined #gluster
15:43 Melamo joined #gluster
15:51 luizcpg joined #gluster
15:54 dusmantkp_ joined #gluster
15:56 farhoriz_ joined #gluster
15:58 rcampbel3 joined #gluster
16:01 haomaiwang joined #gluster
16:06 fgd tertiary: have you tried adding '_netdev' to the list of settings in fstab?
16:09 tertiary yeah its already there, also tried adding nobootwait
16:10 calavera joined #gluster
16:10 JoeJulian valkyr1e: your resumption is incorrect. If a switch fails an you lose communication with a server until lldp resolves a new path, that application that's crawling the server will hang for 42 seconds or until the network is reestablished (better happen in a lot less than 42 seconds). If it does not reestablish, the connection times out to that server. If you have a replicated volume, the crawl continues exactly where it paused.
16:10 JoeJulian s/resumption/presumption/
16:10 glusterbot What JoeJulian meant to say was: valkyr1e: your presumption is incorrect. If a switch fails an you lose communication with a server until lldp resolves a new path, that application that's crawling the server will hang for 42 seconds or until the network is reestablished (better happen in a lot less than 42 seconds). If it does not reestablish, the connection times out to that server. If you have a
16:10 glusterbot replicated volume, the crawl continues exactly where it paused.
16:10 tertiary in the logs i get dns resolution failed and failed to connect to remote host. once im booted and in the terminal i can mount it without a problem
16:11 valkyr1e that depends on the crawler, does it?
16:11 valkyr1e doesnt*
16:12 valkyr1e some apps may panic when it failed to write anything down for 42 seconds
16:12 JoeJulian If the app times out and that timeout is unconfigurable, yes.
16:13 JoeJulian (though if it's open source, you can always fix it yourself)
16:13 valkyr1e that's one approach
16:13 JoeJulian But I've never seen an app that times out reading a file.
16:13 valkyr1e my question remains though
16:13 valkyr1e is zero downtime possible with glusterfs
16:14 JoeJulian Zero downtime is an impossiblity. You need to change your thinking to improbabilities.
16:15 valkyr1e thabk you, but that's not an answer I am looking for
16:15 valkyr1e or maybe glusterfs not
16:15 JoeJulian The probability of an outage is calculated with the mean time between failures (MTBF) and the mean time to repair (MTTR).
16:15 JoeJulian Good luck with that.
16:16 JoeJulian Find a product with an infinite MTBF.
16:16 JoeJulian If you find one, come back here and tell me about it.
16:16 calavera_ joined #gluster
16:17 JoeJulian Check out http://www.eventhelix.com/RealtimeMantra/FaultHandling/system_reliability_availability.htm
16:17 glusterbot Title: System Reliability and Availability Calculation (at www.eventhelix.com)
16:18 nishanth joined #gluster
16:19 JoeJulian btw... if anyone ever sells you something with zero down time look carefully at the fine print in the contract to make sure they don't have an out, then buy it. WHEN it fails, you'll be in a great place to litigate.
16:20 JoeJulian Just don't be on the receiving end of that litigation.
16:20 nekrodesk joined #gluster
16:25 mhulsman joined #gluster
16:26 mhulsman joined #gluster
16:28 RameshN__ joined #gluster
16:34 RameshN_ joined #gluster
16:37 RameshN joined #gluster
16:38 wushudoin joined #gluster
16:40 RameshN_ joined #gluster
16:42 theron joined #gluster
16:48 chirino joined #gluster
16:51 farhoriz_ joined #gluster
16:59 vmallika joined #gluster
17:01 haomaiwa_ joined #gluster
17:03 farhoriz_ joined #gluster
17:07 farhoriz_ joined #gluster
17:10 atinm joined #gluster
17:10 tertiary nobody has any ideas about patching the failed gluster automount? I think ive tried everyhting at this point. @Semiosis I saw you on this bug (https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/1103047) any ideas?
17:10 glusterbot Title: Bug #1103047 “mountall causes automatic mounting of gluster shar...” : Bugs : mountall package : Ubuntu (at bugs.launchpad.net)
17:12 dlambrig_ joined #gluster
17:17 theron joined #gluster
17:19 mhulsman joined #gluster
17:20 post-factum valkyr1e: think of each component of any distributed system as a potential place of failure. if it may fail, it will fail, and every HA system is built with this assumption in mind
17:21 post-factum valkyr1e: also there are nice Google SRE chief talk on SRE in general and availability in particular
17:21 post-factum s/there are/there is/
17:21 glusterbot What post-factum meant to say was: valkyr1e: also there is nice Google SRE chief talk on SRE in general and availability in particular
17:21 JoeJulian tertiary: Did you use _netdev? I didn't think it was used with ubuntu, but that's how everybody else did it, so maybe they did implement that somehow.
17:23 JoeJulian btw... when you do finally get away from upstart and start using systemd, you're going to end up so happy. :)
17:23 post-factum valkyr1e: https://www.youtube.com/watch?v=n4Wf14e2jxQ
17:24 tertiary JoeJulian: yes, I have been using _netdev
17:30 nekrodesk joined #gluster
17:31 JoeJulian Well, the only ideas I have for working around a buggy mountall would be to use noauto and either create an upstart job that does wait to the right time to mount, or just put it in rc.local.
17:32 farhoriz_ joined #gluster
17:34 nekrodesk joined #gluster
17:40 nekrodesk joined #gluster
17:40 calavera joined #gluster
17:42 farhoriz_ joined #gluster
17:44 tmartiro joined #gluster
17:44 nekrodesk joined #gluster
17:44 tmartiro hello folks
17:44 rcampbel3 joined #gluster
17:45 tmartiro Could you please help me to resolve my problem related to the split-brain
17:45 tmartiro I created 2 bricks with 2 replication factor
17:45 tmartiro now it works fine
17:45 tmartiro but in log files I see following error
17:46 tmartiro 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible split-brain)
17:46 tmartiro how can I resolve it
17:46 tmartiro gluster volume heal vg0 full , does not help
17:47 nekrodesk joined #gluster
17:48 dlambrig_ joined #gluster
17:49 post-factum glusterbot: split brain
17:49 glusterbot post-factum: To heal split-brains, see https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md . Also see splitmount https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/ . For additonal information, see this older article https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
17:50 mhulsman joined #gluster
17:52 ro_ I'm having that same issue tmartiro - the problem with those articles is unless I'm missing something they cover split-brains with specific files, whereas the contents of our volume as a whole appear to be out of sync
17:53 calavera_ joined #gluster
17:54 tmartiro ro_, agree, all articles are related to heal the situation with the specified files...
17:55 JoeJulian And they broke that link... sheesh
17:55 tmartiro ro_,do you find the solution ?
17:55 nekrodesk joined #gluster
17:55 ro_ yeah that one link is broken, I don't have a solution to it yet, no
17:56 JoeJulian This is the hard way: https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md
17:56 glusterbot Title: glusterdocs/split-brain.md at master · gluster/glusterdocs · GitHub (at github.com)
17:56 ro_ I'm pretty hesitent to try anything because it's our production filesystem and I'm worried I'll wipe all the data
17:56 JoeJulian There _should_ be an easier way. Trying to find where the document moved.
17:58 farhoriz_ joined #gluster
17:58 tmartiro I did following :  on the 2 my nodes in the brick folders I run following command find -type f -exec cksum {} \; > /tmp/cecksum1.txt
17:59 tmartiro after the command completes I run a diff command with this files
17:59 tmartiro diff cecksum1.txt cecksum2.txt
17:59 tmartiro and I found out that there are 2 files which are different
17:59 tmartiro < 4294967295 0 ./.glusterfs/indices/xattrop/xattrop-1092f733-c4eb-4c1f-bf47-67bbaf62d134
18:00 tmartiro > 4294967295 0 ./.glusterfs/indices/xattrop/xattrop-f76f5673-088b-4ab4-96f3-87ffa449c391
18:00 calavera joined #gluster
18:00 tmartiro so what should I do the next
18:00 tmartiro should I remove this files ?
18:01 JoeJulian does .glusterfs/10/92/1092f733-c4eb-4c1f-bf47-67bbaf62d134 or .glusterfs/f7/6f/f76f5673-088b-4ab4-96f3-87ffa449c391 exist?
18:01 haomaiwang joined #gluster
18:01 tmartiro moment , let me check
18:03 HamburgerMartyr joined #gluster
18:05 tmartiro JoeJulian, nope... there are no files like that on both of the nodes
18:05 JoeJulian Then you can remove them.
18:06 JoeJulian Those are indicators showing that those files were in need of self-heal. Since they don't exist, that's irrelevant.
18:06 JoeJulian Someone should file a bug report.
18:06 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
18:06 tmartiro ok thanks
18:06 JoeJulian If an xattrop file exists and no matching gfid exists, the xattrop file should be removed.
18:07 tmartiro ok I removed them
18:08 tmartiro but I still see the errors in the log files
18:08 tmartiro should I perform heal
18:08 tmartiro ?
18:08 farhoriz_ joined #gluster
18:09 nerdcore joined #gluster
18:09 theron joined #gluster
18:09 nerdcore what is the difference between "distributed" and "striped"? I'm a bit confused by this doc: http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/
18:09 glusterbot Title: Setting Up Volumes - Gluster Docs (at gluster.readthedocs.org)
18:10 JoeJulian Won't change anything. It's true, it's "Unable to self-heal contents of '/'" because a directory should never get a data mismatch inidcation. I would just remove any trusted.afr ,,(extended attributes) from the brick roots.
18:10 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://pl.atyp.us/hekafs.org/index.php/2011/04/glusterfs-extended-attributes/
18:10 JoeJulian @stripe
18:10 glusterbot JoeJulian: Please see http://joejulian.name/blog/should-i-use-stripe-on-glusterfs/ about stripe volumes.
18:10 nerdcore also not sure how to create a distributed volume, as the "gluster volume create" line suggested there indicates "[stripe  | replica | disperse]" (where is "distributed"?
18:11 JoeJulian If you have multiples of your stripe, replica, or dispurse count, distribution will happen on top of that. So a replica 2 volume with 4 bricks will put (roughly) half the files on the first replica pair and the second half on the second.
18:11 nerdcore I suspect I want "distributed" and not "striped" for my particular use case
18:12 JoeJulian That's generally true.
18:16 nerdcore JoeJulian: you blog post states "Distribute is the default volume configuration of choice." Does this mean that if I do not specify one of "stripe", "replica", or "disperse" it will be distributed?
18:17 kotreshhr joined #gluster
18:17 valkyr1e JoeJulian: zero downtime is a goal that can be chased but can not be catched
18:17 valkyr1e I understand that well
18:17 valkyr1e however, my concern here is that when one data node is down due to human or system errors
18:17 JoeJulian nerdcore: correct
18:17 valkyr1e the rest of the cluster would fail
18:17 nerdcore JoeJulian: thx!
18:18 valkyr1e now, I know I would compare orange and apple here
18:18 valkyr1e but look at the centralized model with metadata server and separate data nodes
18:18 valkyr1e metadata server would be a huge single point of failure
18:19 valkyr1e but if one data node fails, the rest continue
18:19 JoeJulian Well, to be fair, even ceph has mitigated that with redundant metadata servers now, but in principal I agree.
18:19 valkyr1e in reality, keeping one particular server to be up as much as possible is not that difficult, say, throw more and more money into that
18:20 valkyr1e but keeping every single server of the cluster with that priority/special treatment would be hard
18:20 valkyr1e expensive
18:20 ro_ hey @JoeJulian we recently had an issue where we came in and gluster was completely hung up, anything we did with the mount would hang. We ended up having to kill all gluster processes, unmount, restart gluster and mount again to resolve it.
18:20 JoeJulian Meh, you're mostly going to get in the 45000-47000 hour range, regardless of money.
18:20 ro_ our admins looked into it and said there were no networking issues
18:21 ro_ is there anything we should be looking at to warn us that gluster's gettign into a bad state? The logs didn't really indicate anything. Right now they're just complaining about this split brain I'm working on resolving
18:22 JoeJulian ro_: in the future, just as an fyi, if a fuse filesytem is hung, you only need to kill the userspace process that backs it to un-hang it (glusterfs process in this case).
18:22 JoeJulian ro_: Which version?
18:22 ro_ the latest
18:22 ro_ well lemme check to be sure
18:22 JoeJulian 3.7.8? that was released today?
18:22 ro_ I'm pretty sure the latest though
18:22 ro_ 3.7.3
18:23 JoeJulian Oh, not even close. :D
18:23 ro_ that may be off a minor release or two
18:23 ro_ no?
18:23 JoeJulian I strongly recommend anybody upgrade to today's 3.7.8 release. There have been many memory leak bugs fixed.
18:23 calavera joined #gluster
18:24 JoeJulian and memory leaks on a locally mounted filesystem /could/ lead to memory contention and lockup. It's possible.
18:24 nerdcore I'm using 3.6.8 because I used the Ubuntu PPA for 12.04. Bad idea? :P
18:24 ro_ this may be a dumb question but where is 3.7.8?
18:24 JoeJulian nerdcore: I'm stuck there on this old ratty ubuntu server, too.
18:24 ro_ http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/x86_64/ i see 3.7.6 here
18:24 glusterbot Title: Index of /pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/x86_64 (at download.gluster.org)
18:25 nerdcore JoeJulian: well that makes me feel much better TBH :)
18:25 morse joined #gluster
18:25 kotreshhr left #gluster
18:26 JoeJulian I like our new shiny Arch Linux rollout. It's something we can actually keep up to date.
18:26 JoeJulian Rolling distros ftw.
18:29 tmartiro JoeJulian, I've got the trusted.afr attributes by invoking following command - getfattr -m . -d -e hex /brick1/gv0
18:30 tmartiro there are 2 of them
18:30 tmartiro trusted.afr.gv0-client-0=0x000000000000000200000000
18:30 tmartiro trusted.afr.gv0-client-1=0x000000000000000000000000
18:30 nerdcore any thoughts on what would cause my new brick not to start? It created fine but I'm getting a connection refused message on tcp/24007 on IP address 10.5.11.20 which is the server from which I am issuing the commands: http://nerdcore.net/mike/gluster-volume-start-connection-refused-1455128937.txt
18:30 nerdcore *new volume
18:32 nerdcore my iptables rules on this box have a default policy on INPUT of DROP, but specifically allows any incoming connection on 10/8 if it comes on the correct interface. Do you suppose bacula could somehow be trying to connect using an incorrect interface (lo)?
18:33 morse joined #gluster
18:33 JoeJulian Only way to know for sure is to add LOG rules.
18:34 JoeJulian I also highly recommend never using a default policy other than allow so if something happens while applying rules you don't lock yourself out. Leave your last rule as the one that drops whatever's left.
18:35 nerdcore same effect of locking you out if you screw up your accept rule
18:35 nerdcore i use a default policy and I'm used to doing so and it makes good sense to me to do so. That way I can add rules to the bottom more easily
18:37 tmartiro JoeJulian, if I remove the trusted.afr attributes from root folder of a brick, will it mean that the gluster will not replicate anymore ?
18:37 JoeJulian tmartiro: no, it just means it won't think it needs to heal it - which it doesn't.
18:37 tmartiro ok thanks
18:41 nerdcore this is also confusing. when creating the volume I never asked for anything to be done on localhost, only on two LAN IPs (10.5.11.20,10.5.11.10) so the first of these two errors makes some sense, but why does the second error refer to any operation taking place against "localhost"? http://nerdcore.net/mike/gluster-volume-start-connection-refused-1455129583.txt
18:45 ovaistariq joined #gluster
18:47 JoeJulian nerdcore: did you use hostnames for your servers?
18:49 nerdcore no, IPs addresses
18:50 nerdcore I issued the "volume create" command on 10.5.11.20 and used IPs 10.5.11.20 and 10.5.11.10 for the two bricks
18:51 JoeJulian I wonder if they're checking and using localhost if it's local.
18:51 nerdcore oh shit, i think the two /etc/hosts files are not consistent and a shortname in each is becoming confused
18:52 nerdcore 10.5.11.10 thinks 10.5.11.20 has a hostname which 10.5.11.20 refers to as 127.0.0.1 !
18:52 nerdcore *facepalm*
18:53 nerdcore I never asked for such hostname to be used, but `gluster volume status ...` indicates that somewhere along the way a hostname was picked up from /etc/hosts...
18:53 nerdcore I can fix this...
18:53 mhulsman joined #gluster
18:57 nerdcore damn.. same problem...
19:01 haomaiwa_ joined #gluster
19:06 nerdcore disabled some firewall rules (one of the two servers was in a configuration where ACCEPT was the policy and a REJECT rule finished the list; other server had both DROP policy and REJECT rule) and its started now
19:06 nerdcore what should I be looking for to allow gluster to function which was apparently being blocked a minute ago?
19:06 ron-slc joined #gluster
19:07 nerdcore * what tcp/udp ports should i ACCEPT?
19:08 nerdcore GAH! It appears that there ARE packets belonging to network 10.5.11.0 traversing the "lo" interface on one of my bricks. What's up with that?
19:08 post-factum tcp/24007, tcp/49152-49252
19:09 nerdcore normally I block such packets as they do not belong on this interface
19:09 nerdcore I usually accept 127/8 on interface lo and block everything else
19:10 nerdcore i understood this to be good firewalling practice and it makes good logical sense to me; I've never seen an application attempt to send packets across an incorrect interface like this. Can I tell glusterd / glusterfsd to bind to a specific (non-lo) interface?
19:28 nekrodesk joined #gluster
19:29 dlambrig_ joined #gluster
19:31 ovaistariq joined #gluster
19:31 JoeJulian no
19:32 nerdcore wow
19:32 JoeJulian That said, yes, but it's a pain in the butt and you have to write a configuration hook in /var/lib/glusterd/hooks
19:33 nerdcore what would cause glusterd to attempt to communicate wth a 10. IP address over the lo interface? That's just wrong.
19:33 JoeJulian I've never seen that happen.
19:33 nerdcore fair 'nuff
19:33 JoeJulian Perhaps it's a kernel thing.
19:34 JoeJulian I know the kernel does do some shortcuts when a packet is all internal.
19:34 nerdcore i added a firewall rule I don't much like to allow that one IP address over the lo interface :/
19:34 nerdcore i suppose I could limit the tcp port as well to be more strict
19:34 nerdcore the volume is working though! :D
19:34 JoeJulian excellent
19:34 JoeJulian blog about your findings?
19:35 JoeJulian I would be interested.
19:35 nerdcore I think I will...
19:35 nerdcore been meaning  to do more of that anyway
19:35 nerdcore your blog has been quite helpful
19:35 JoeJulian I hear you.
19:35 nerdcore others pass me links to your blog as well; Thank you
19:35 JoeJulian You're welcome. Glad I could help.
19:36 nerdcore might as well start blogging while I wait for 600G to copy onto this volume haha
19:38 nerdcore very aside, but is there an official GlusterFS logo I could attach to such a post? :P
19:38 nekrodesk joined #gluster
19:39 post-factum i'd wonder, why there is a bug chewing a leaf on official gluser logo?
19:40 post-factum is that something to tell us about? light hint?
19:41 JoeJulian because many ants work together to make a huge colony
19:41 JoeJulian or something like that.
19:42 JoeJulian I've never really been a fan of the ant.
19:42 toshywoshy joined #gluster
19:42 post-factum right, ant
19:42 nerdcore many small ants work together to eat entire plants to their roots
19:43 post-factum fat drunk penguing after beer party is not that bad then
19:45 JoeJulian Anyway, the current official logo is https://www.gluster.org/images/antmascot.png
19:47 nerdcore thx
19:49 unclemarc joined #gluster
19:50 theron joined #gluster
19:51 shaunm joined #gluster
19:53 plarsen joined #gluster
19:57 djgerm left #gluster
20:01 haomaiwa_ joined #gluster
20:06 SpeeR joined #gluster
20:09 SpeeR I'm trying to find documentation saying yes/no but can't seem to find the answer. When using the gluster client to mount a volume, does the client handle the failover to another brick gracefully if the brick it's currently using fails?
20:10 Wizek joined #gluster
20:10 JoeJulian SpeeR: Yes and no. There is no failover. The client connects to all the bricks and handles the replication (and distribution, etc).
20:10 tmartiro I'm getting error when trying to access the mounted volume via nfs.... nfs.log throws these errors: W [nfs3.c:4111:nfs3svc_readdir_fstat_cbk] 0-nfs: ff7ccd4b: / => -1 (Input/output error)
20:11 JoeJulian There's no error there.
20:11 JoeJulian despite the word at the end of the line.
20:11 SpeeR so say I have 2 bricks, in a replica, if brick 1 dies, the client will continue on, using brick 2?
20:11 JoeJulian yes
20:12 SpeeR ok, thanks, that's what I was hoping was the case, now to setup the test env
20:12 tmartiro what about this one  - [nfs3-helpers.c:3480:nfs3_log_readdirp_res] 0-nfs-nfsv3: XID: ff7ccd4b, READDIRPLUS: NFS: 5(I/O error), POSIX: 5(Input/output error), dircount: 4096, maxcount: 32768, cverf: 94253983872616, is_eof: 0
20:13 JoeJulian What's the letter you cut off right before that?
20:13 tmartiro W
20:13 JoeJulian Warning
20:13 tmartiro yep
20:14 JoeJulian So unless something's not working, I ignore anything less than ' E '.
20:14 tmartiro ok, but why then I've got Input/output error ?
20:15 JoeJulian This is where, if there was something failing, I would look at nfs3-helpers.c (line 3480) and figure out what it's doing and what it's expecting.
20:16 JoeJulian But instead, I'm figuring out how to get access to three different networks inside my systemd-nspawn container. :D
20:16 tmartiro :))
20:18 ovaistariq joined #gluster
20:18 tmartiro in docker you can just create 3 different bridges and create the net interfaces in that bridges
20:18 CyrilPeponnet Hey guys, just finished a debugging session with Kotresh
20:19 CyrilPeponnet apparently there is too many fd for process holding the bricks (he need to check with core team).
20:19 CyrilPeponnet I'd like to know what does it means ? Is there a fd for each files open by clients ?
20:19 CyrilPeponnet (unfortunaly he's gone to sleep as I guess he's in India)
20:21 JoeJulian seems pretty likely
20:21 CyrilPeponnet humf
20:22 CyrilPeponnet So I guess it's a dead end for with geo-rep :/ I can still run it in xsync but xsync sucks
20:22 JoeJulian shouldn't be. Make more FH available.
20:23 JoeJulian http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
20:23 CyrilPeponnet He told me: "I think it's the issue with fd being greater than FD_SETSIZE"
20:24 CyrilPeponnet is FD_SETSIZE using the ulimit for files ?
20:25 CyrilPeponnet the limit is 6536076 on my nodes and I have around 6500 fd in /proc/pid/fd for bricks process...
20:26 CyrilPeponnet ulimit -n give me 1024 as root user
20:27 CyrilPeponnet et hardlimit to 4096
20:27 CyrilPeponnet I don't get it...
20:28 JoeJulian Ah, I see. FD_SETSIZE is a library define <sys/types.h> and would need to be overridden in the source in order to be changed.
20:29 JoeJulian It seems the general concensus, however, is to not use select() but rather to use poll() which doesn't have these restrictions at all.
20:29 CyrilPeponnet I can see that also. Is gluster using select ?
20:29 JoeJulian Where's the error again?
20:30 CyrilPeponnet https://gist.github.com/CyrilPeponnet/1925837fb2d567841955
20:30 glusterbot Title: gist:1925837fb2d567841955 · GitHub (at gist.github.com)
20:31 CyrilPeponnet (I added a comment)
20:31 mhulsman joined #gluster
20:31 JoeJulian This doesn't match what I'm looking at. Which version of gluster?
20:32 CyrilPeponnet 3.6.5
20:32 CyrilPeponnet .1.el7
20:36 JoeJulian Yeah, looks like it's because they're using select(). Using poll() would solve the problem.
20:36 CyrilPeponnet From Vijay: So as guessed, it is the limitation of select to handle only FD_SETSIZE fds (i.e, 1024 by default). From the analysis, we saw around 5000 fds open by brick process, all are pointing to backend gfid hardlinks (.glusterfs/ab/cd/abcd...gfid).
20:37 CyrilPeponnet The obvious solution is to move from select to epoll! But the other question is it expected for a brick process to have so many open fds on backend gfid files? Or is it a fd leak somewhere?
20:37 JoeJulian I have no idea what implementing that change might entail.
20:37 CyrilPeponnet and moving to epoll looks like a better solution to me. we intentionally bump up the rlimit for open files to 1M. If there is a lot of I/O activity happening in parallel, it is not unusual to go beyond 1024 open fds.
20:38 CyrilPeponnet so for what I can understand I'm screwed :p
20:38 JoeJulian especially when you've nearly that many clients.
20:38 JoeJulian Today.
20:38 JoeJulian I know they've implemented epoll in other parts of the code, so there's at least some experience already in changing that.
20:39 JoeJulian And hagarth is Vijay, btw, so he can chime in at any moment. ;)
20:39 CyrilPeponnet Oh :)
20:40 JoeJulian To me it looks simple enough that I wouldn't be afraid of trying it (but I'm really way too busy).
20:41 CyrilPeponnet I just want to know if I screwed with this version of gluster (I think so). So I will need to use xsync or manual sync to mimic the geo-rep
20:42 JoeJulian ... and that's saying something because I haven't touched C in 20+ years, so I don't trust my ability very often.
20:45 CyrilPeponnet :p
20:46 arcolife joined #gluster
20:46 gildub joined #gluster
20:52 unclemarc joined #gluster
21:01 haomaiwa_ joined #gluster
21:04 haomai___ joined #gluster
21:05 CyrilPeponnet where can I find a detailed description of xsync vs changelog ?
21:05 CyrilPeponnet @JoeJulian @hagarth
21:05 CyrilPeponnet I want to explore this before writting some f*** rsync scripts
21:15 drankis joined #gluster
21:20 JoeJulian Looks like someone took a shortcut and never created the feature page they were supposed to for changelog.
21:21 nekrodesk joined #gluster
21:22 JoeJulian essentially, changelog created a log that could be consumed by multiple servers allowing them to transfer data in parallel and spread the load. Before that all changes were marked in a directory on the master and the master would poll that directory and rsync the files listed to the target
21:25 Ramereth joined #gluster
21:28 gildub joined #gluster
21:33 nekrodesk joined #gluster
21:39 [1]Ethical2ak joined #gluster
21:39 CyrilPeponnet I have pretty good understanding of changelog now
21:39 coredump|br joined #gluster
21:39 CyrilPeponnet but not xsync
21:40 rcampbel4 joined #gluster
21:40 JoeJulian xsync: make a list of all the files that changed in the last 10 minutes. rsync them to the slave.
21:40 CyrilPeponnet ok and the list is dine by crawling ?
21:40 CyrilPeponnet s/dine/done
21:41 JoeJulian Nope, it's done when changes happen to a file.
21:41 CyrilPeponnet Well anyway, I just came back from a meeting and it has been decided to not use geo-rep...
21:41 the-me_ joined #gluster
21:42 CyrilPeponnet that's sad...
21:42 dlambrig_ joined #gluster
21:42 JoeJulian Forever, or just this iteration?
21:42 wushudoin| joined #gluster
21:42 CyrilPeponnet I think this will be permanent...
21:42 CyrilPeponnet looks like xsync cannot mange delete
21:43 JoeJulian Or didn't manage it well.
21:43 CyrilPeponnet yeah I cannot find an update info for that
21:43 Dave_____ joined #gluster
21:44 CyrilPeponnet that's suck but need to move on... we are out of schedule :/
21:44 wolsen joined #gluster
21:45 JoeJulian I know how that goes.
21:47 drankis left #gluster
21:47 DJCl34n joined #gluster
21:48 Vaelater1 joined #gluster
21:48 DJClean joined #gluster
21:49 xMopxShell joined #gluster
21:50 wistof joined #gluster
21:50 CyrilPeponnet quick question what is the best volume to host vms (qcow2) with kvm ? D-R 2x2 ? (on two nodes with raid 6 or raid 0)
21:52 JoeJulian I prefer replica 3 with raid0 bricks
21:53 CyrilPeponnet so 3 nodes ?
21:53 JoeJulian Well, 27, but essentially, yes.
21:53 CyrilPeponnet lol
21:54 Wizek joined #gluster
21:55 CyrilPeponnet today I have a distributed volume on top of raid 0 brick on two nodes
21:55 valkyr1e joined #gluster
21:55 JoeJulian That would put me into cardiac arrest.
21:55 CyrilPeponnet this is called
21:55 CyrilPeponnet scratch
21:55 CyrilPeponnet :p
21:55 CyrilPeponnet don't care about the data on this one.
21:56 harish_ joined #gluster
21:56 JoeJulian Ok, then I would survive.
21:56 CyrilPeponnet but for vms I can cut a replicat 2 on top of raid 0 will it be better than replica 2 on top of raid 6 or d-R 2x2 on top of raid 0 / raid 6
21:56 CyrilPeponnet (sorry for the sentence :p)
21:57 CyrilPeponnet 2 nodes, each of them 2 raid, one 0 and one 6. To host vms, what is the best.
21:57 CyrilPeponnet I can even use libgfapi for them
21:58 scubacuda joined #gluster
21:58 renout_away joined #gluster
21:58 JoeJulian But seriously, we have 24 rust and 6 ssd per server, we raid0 4 each rust giving us 6 bricks per server. 27 servers at replica 3 gives us 72 replicated bricks.
21:58 JoeJulian Still gives us better than 6 nines.
21:59 CyrilPeponnet I don't get the 6 nines
21:59 JoeJulian 99.9999% uptime.
22:00 CyrilPeponnet ok in my case I have only 2 nodes :p
22:00 CyrilPeponnet 12 rust, no ssd
22:00 JoeJulian And you can funge those numbers, too. If you look at per-file availability, the numbers go through the roof.
22:00 JoeJulian "beyond statistical significance".
22:01 CyrilPeponnet So should I go for replica 2 on top of raid 0 ? or distributed on top of raid 6
22:01 JoeJulian SAS? SATA?
22:01 JoeJulian How much network?
22:01 CyrilPeponnet 10G
22:01 haomaiwa_ joined #gluster
22:01 CyrilPeponnet between the glusters
22:01 CyrilPeponnet not sure for SAS or SATA...
22:01 JoeJulian client-server
22:02 CyrilPeponnet client 1 GB
22:02 CyrilPeponnet 80 hypervisors for now will certainly be extended to 300
22:02 JoeJulian Well then I wouldn't bother using more than two raid0 disks per brick since that'll saturate 1G anyway.
22:03 CyrilPeponnet the current setup is 4 disks in raid 0, 8 in raid 6
22:03 JoeJulian Is that 12 per server, or 12 total?
22:03 CyrilPeponnet per servers
22:04 CyrilPeponnet two raid controlers , one with 8 disks the other with 4
22:04 JoeJulian Ok, so 6 raid0 disks per server, replica 2. That'll max out your network and give you about 4 nines (as good as amazon).
22:04 CyrilPeponnet 7200 sata afaik
22:04 CyrilPeponnet Ok, make sense.
22:04 CyrilPeponnet Thanks for the input
22:04 JoeJulian You're welcome.
22:05 nerdcore JoeJulian: https://openconcept.ca/blog/mikem/adventures-glusterfs
22:05 glusterbot Title: Adventures in GlusterFS | OpenConcept Consulting Inc. (at openconcept.ca)
22:05 JoeJulian Thanks!
22:06 nerdcore it gives some general glusterfs overview, as I have not blogged about it previously, but also details the network issues I encountered today
22:06 nerdcore and how I fixed them :D
22:06 CyrilPeponnet @JoeJulian are you using some custom option on your volume for vms ?
22:06 JoeJulian Just the recommended... one sec...
22:09 JoeJulian https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Configuring_Red_Hat_OpenStack_with_Red_Hat_Storage/sect-Setting_up_Red_Hat_Storage_Trusted_Storage_Pool.html#Tuning_Red_Hat_Storage_Volumes_for_Red_Hat_OpenStack
22:09 glusterbot Title: 3.2. Setting up Red Hat Storage Trusted Storage Pool (at access.redhat.com)
22:10 CyrilPeponnet ok thanks, any recommandation for quorum ?
22:13 abyss^ joined #gluster
22:15 JoeJulian Set server quorum. Add a non-storage host to the peer group and set server-quorum-ratio=60 and server-quorum-type=server (per-volume)
22:16 JoeJulian That way you have three computers checking for connectivity. If one of them loses connection to two, it knows it's partitioned and stops serving.
22:16 JoeJulian Without a third, if either lost connection to each other, they'd both stop serving and that wouldn't work so well.
22:17 JoeJulian And like I say, that third machine doesn't have any bricks on it and isn't part of the volume, it's just there to answer pings.
22:17 mowntan_ joined #gluster
22:17 julim joined #gluster
22:17 CyrilPeponnet make sense.
22:18 DJCl34n joined #gluster
22:18 CyrilPeponnet I will start with no third computer, but this is something I will thing about after.
22:19 ovaistar_ joined #gluster
22:19 JoeJulian You could even use a Raspberry Pi.
22:20 CyrilPeponnet because the scratch zone I talked above will be used as well by the vms, so basically as it's a D vol, if one node is gone, I think it's to stop serving
22:20 Kins_ joined #gluster
22:21 CyrilPeponnet Use full advices. I could use a container for that also
22:21 JoeJulian Yep
22:21 CyrilPeponnet okay time to write some puppet manifest
22:22 CyrilPeponnet thanks @JoeJulian as usual
22:22 djgerm joined #gluster
22:22 DJClean joined #gluster
22:22 HamburgerMartyr_ joined #gluster
22:25 djgerm how do I prevent peers being automatically added upon initial install/start?
22:26 djgerm seems that there's some sorta gossip channel or something and new installs on the same subnet automagically add eachother to their peer pool?
22:27 djgerm or…. is there a command to forget all the pool party buddies?
22:30 nekrodesk joined #gluster
22:31 abyss^ joined #gluster
22:34 nerdcore left #gluster
22:35 jwang joined #gluster
22:37 JoeJulian gluster volume peer detach
22:38 JoeJulian If a server is not (nor has been) part of the peer group, it will not be allowed to join it unless you probe it from a trusted peer.
22:39 djgerm huh… interesting…
22:40 abyss^ joined #gluster
22:41 lanning djgerm: are you cloning gluster servers?
22:41 djgerm I am not issuing any peer probe after install, (or rather I am, but I am getting a "$hostname is already part of another cluster"
22:42 djgerm lanning: I am attempting to automate the deployment from scratch.
22:42 djgerm not cloning perse.
22:42 djgerm but issuing the same commands at nearly the same time on the same servers.
22:42 djgerm which I am thinking could be the problem
22:42 lanning the probe command must be run from a server that is already a member of a cluster
22:43 JoeJulian Yep, that's the problem. If you peer probe from all the servers simultaneously, one of them will establish a peer group and the rest won't be able to join in.
22:43 lanning basically the cluster has to invite the new member
22:43 JoeJulian You'll have to go to Italy to find out how to solve that problem. ;)
22:43 djgerm DEAL! I just need to convince the boss man to send me
22:43 JoeJulian :)
22:44 JoeJulian It's actually surprisingly inexpensive.
22:44 JoeJulian I've spent more going to Boston.
22:46 djgerm interesting…
22:46 djgerm Also… ewww. I'd much rather be in Italy
22:47 JoeJulian So far the most "ewww" city I've been to for a conference was actually San Francisco.
22:47 CyrilPeponnet I don't like it also (been there for almost to year now)
22:47 djgerm well… yeah. San Francisco isn't the gem it used to be… When I was a kid, that was my favorite place… it's horrid now.
22:48 JoeJulian Always nice to come home to Seattle though.
22:48 CyrilPeponnet Vancouver is great, except the weather
22:48 JoeJulian Vancouver is great, I agree.
22:48 CyrilPeponnet (not too far from you after all)
22:48 CyrilPeponnet 2h30 and a border to cross
22:49 JoeJulian I'm actually slightly closer. Edmonds.
22:49 CyrilPeponnet You're in BC ?
22:49 JoeJulian And if Trump gets elected, I might just join you up there.
22:49 JoeJulian Edmonds, not Edmonton.
22:50 CyrilPeponnet Edmonds, Burnaby, BC
22:50 JoeJulian Heh, I did not know that.
22:51 CyrilPeponnet :p
22:54 DV joined #gluster
23:01 haomaiwang joined #gluster
23:08 CyrilPeponnet @JoeJulian how many vms you have on your gluster setup ?
23:17 JoeJulian I'm not sure. Hundreds... One particularly annoying customer (annoying in the sense that I told management not to sell them the ability to do this) which has 10 x 20TB images.
23:22 fale joined #gluster
23:23 fale if I have 2 nodes, a volume has 2 bricks (one per node), and replication 2, will it perform lookups eachtime I require to open a file?
23:24 dlambrig_ joined #gluster
23:24 JoeJulian Regardless of server configuration, yes. A lookup is done for every open, fstat, getfattr, anything that creates a file descriptor.
23:24 jwd joined #gluster
23:25 fale JoeJulian: thanks
23:35 dlambrig_ joined #gluster
23:45 jwaibel joined #gluster
23:46 cliluw How do I tell which bricks are replicas of each other in a distributed-replicate volume?

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary