Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-01-03

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 redshirtlinux1 joined #gluster
00:06 redshirtlinux1 Hello All, I had a Cisco Catalyst perform a soft reboot over the holidays.  It disconnected several over my gluster peers from their iscsi volumes.  Now on my clients I am missing files and only see a fraction of the space when issuing 'df -h'.  This is despite the fact that I have all peers back online.
00:07 redshirtlinux1 I have a Distributed-Replicate setup
00:07 redshirtlinux1 Am I looking at split brain and if so what steps could help me recover the volume
00:07 JoeJulian My first guess would be that (some of) the iscsi filesystems aren't mounted.
00:08 redshirtlinux1 They are.  On all eight nodes the zpool status is showing as online
00:09 JoeJulian Next I'd look at peer status and volume status.
00:11 redshirtlinux1 Number of Peers: 7
00:11 redshirtlinux1 Hostname: gluster6.acc.local
00:11 redshirtlinux1 Uuid: 8667428f-0e52-4785-9841-3897f4333c28
00:11 redshirtlinux1 State: Peer in Cluster (Connected)
00:11 redshirtlinux1 Hostname: gluster7.acc.local
00:11 redshirtlinux1 Uuid: 7b69d7c7-6b01-4233-ae41-d86fce9b6eb2
00:11 redshirtlinux1 State: Peer in Cluster (Connected)
00:11 redshirtlinux1 Hostname: gluster2.acc.local
00:11 redshirtlinux1 Uuid: 574fc5ee-d7bf-4029-9589-8ae50363a4a9
00:11 redshirtlinux1 State: Peer in Cluster (Connected)
00:11 redshirtlinux1 Hostname: gluster4.acc.local
00:11 redshirtlinux1 Uuid: b8ad1243-03d5-46fb-b9ed-893ef6681953
00:11 JoeJulian use a ,,(pastebin)
00:11 glusterbot I do not know about 'pastebin', but I do know about these similar topics: 'paste', 'pasteinfo'
00:11 JoeJulian @paste
00:11 glusterbot JoeJulian: For RPM based distros you can yum install fpaste, for debian and ubuntu it's pastebinit. Then you can easily pipe command output to [f] paste [binit] and it'll give you a URL.
00:12 redshirtlinux1 installing now...
00:13 JoeJulian But, honestly, if you read the outputs and they look right to you, that's good enough for me. Showing me something wrong is generally more productive. :D
00:14 redshirtlinux1 It looks right... Peer status is showing total cluster nodes - 1 (it doesn't count itself)
00:15 redshirtlinux1 volume info shows all eight bricks
00:16 redshirtlinux1 4 of the bricks were added two weeks ago prior to christmas... those appear to be the ones that are out of the loop so to speak
00:16 redshirtlinux1 The volume size reported by the clients is the same size that the volume was when I created it a year ago, prior to added 4 additional bricks
00:17 redshirtlinux1 http://paste.ubuntu.com/6681686/
00:17 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
00:19 redshirtlinux1 So bricks 1-4 are the original 5-8 new... 5-8 I added two weeks ago, and disappeared after our Cisco Catalyst decided to reboot
00:20 InnerFIRE left #gluster
00:22 redshirtlinux1 I notice that when I do a rebalance I am not seeing iops on 5-8
00:24 redshirtlinux1 However if I got to those bricks directly I can see the contents on each of their zpools
00:32 theron joined #gluster
00:33 DV joined #gluster
00:56 mkzero joined #gluster
01:03 redshirtlinux1 Wow nice Ubuntu 12.04's gluster 3.2.5 doesn't have gluster volume heal
01:03 JoeJulian @latest
01:03 glusterbot JoeJulian: The latest version is available at http://download.gluster.org/p​ub/gluster/glusterfs/LATEST/ . There is a .repo file for yum or see @ppa for ubuntu.
01:04 redshirtlinux1 Need to get out of this split brain prior to considering an upgrade
01:06 JoeJulian Does the file name show up?
01:07 JoeJulian Also, have you looked in the client log for errors?
01:07 redshirtlinux1 One of the files that I've confirmed is missing wasn't present when I ls'ed each Brick
01:08 JoeJulian If the file is not on the bricks, that wouldn't be split-brain. That would be a file that's not there.
01:09 JoeJulian My guess would be that one (or more) of the iscsi based filesystems wasn't mounted and the files were created on the server root filesystem.
01:09 JoeJulian Just a guess though if you're sure the file is supposed to exist.
01:12 redshirtlinux1 The client is talking with Bricks 1-4 and not 1-8 according to its log file
01:12 JoeJulian Aha
01:13 redshirtlinux1 line starts with option remote-host and it shows the first four bricks but not all of them
01:14 redshirtlinux1 any reason why the client would be ignoring the other bricks
01:14 JoeJulian Sounds like you have a stale configuration on one or more servers. Look in /var/lib/glusterd/vols/$your_volume/info on the various servers (that version might have them in /etc/glusterd/... instead) and see if one of them looks more right than the others.
01:15 redshirtlinux1 on client or on bricks?
01:15 JoeJulian servers
01:15 JoeJulian @glossary
01:15 glusterbot JoeJulian: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
01:18 redshirtlinux1 The file at /etc/glusterd/vols/glustervol/info appears to match on all 8
01:19 JoeJulian And does it show all 8 bricks?
01:19 redshirtlinux1 count=8 and is shows brick-1 through 8
01:19 JoeJulian Interesting.
01:19 JoeJulian And do you have a /var/lib/glusterd directory?
01:20 redshirtlinux1 no
01:21 cyberbootje JoeJulian: can you tell me more about a possible ext4 FS bug in gluster? can it cause a full blown R/O filesystem on the whole storage server?
01:21 JoeJulian cyberbootje: The ext4 bug used to cause the brick process to go in to an endless loop, causing the client to freeze.
01:22 JoeJulian redshirtlinux1: Can you mount the volume to a different directory and fpaste the new client log please?
01:22 Skidsteerpilot_ joined #gluster
01:22 cyberbootje JoeJulian: ah ok, then that's not what i'm facing
01:23 JoeJulian r/o filesystems are usually caused by a timeout. If that's happening on hardware storage, you have a hardware problem.
01:24 cyberbootje JoeJulian: well i have a replica set 2, and storage 1(that one is user by the client) went in R/O and the second storage server took over, the service on the client was not compromised
01:25 cyberbootje JoeJulian: i looked everywhere, in any possible log and i just can't find anything
01:25 JoeJulian cyberbootje: dmesg?
01:25 JoeJulian cyberbootje: if the file system was read-only, how would it write the log? ;)
01:26 redshirtlinux1 Here you go:  http://paste.ubuntu.com/6681964/
01:26 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
01:27 JoeJulian redshirtlinux1: That looks perfect. It's connecting to all 8 bricks in replica 2 sets.
01:27 redshirtlinux1 That log file looks proper... however, the df -h is still missing 9 TB
01:27 cyberbootje JoeJulian: nothing in dmesg and yes, that's true but it would be nice to know why it happened
01:27 JoeJulian What's the brick sizes?
01:28 redshirtlinux1 The first four at 500 GB, 5 and 6 3 TB, 7 and 8 6 TB
01:28 JoeJulian cyberbootje: hmm, I was sure that should be in dmesg when the filesystem goes R/O. Why are you people having odd problems on the day after new years... I'm not all awake you know... ;)
01:29 JoeJulian redshirtlinux1: If it were missing 9TB, it would show 0. You only have a 9TB volume.
01:29 JoeJulian no
01:30 JoeJulian See... not all awake...
01:30 JoeJulian So you're showing a 1TB volume.
01:30 redshirtlinux1 10.210.0.200:/glustervol  1.1T  766G  265G  75% /dog
01:30 JoeJulian Math is supposed to be one of my strong suits...
01:30 redshirtlinux1 Yep which was the original size when I just had the first four bricks at 500 GB each
01:30 redshirtlinux1 A few days ago before the switch rebooted, I had all 9 showing up in df
01:31 cyberbootje JoeJulian: Sorry :-) can't help it, and no there is really nothing in dmesg i'm getting crazy here
01:31 JoeJulian 10... now I've got you doing it.
01:31 JoeJulian df each brick. make sure it's what you think it is.
01:32 JoeJulian cyberbootje: You weren't far from crazy to begin with. ;)
01:32 JoeJulian cyberbootje: Are you mounting your volume via nfs or fuse?
01:32 cyberbootje fuse, native gluster
01:33 mkzero joined #gluster
01:33 JoeJulian Is your kernel up to date for it's distro?
01:33 JoeJulian its
01:33 redshirtlinux1 Yikes, found the problem
01:33 JoeJulian I hate when I screw that one up. It's my worst grammatical bad habit.
01:33 JoeJulian redshirtlinux1: excellent
01:34 cyberbootje JoeJulian: that's just the weird thing.... the client is a VM host, storages and gluster is the shared storage, all machines stayed online, failover worked perfect
01:34 cyberbootje even the heal
01:34 redshirtlinux1 ZFS didn't mount the zfs back on the root file system
01:34 redshirtlinux1 so it wrote to root
01:34 redshirtlinux1 Ahhh
01:34 redshirtlinux1 :)
01:34 JoeJulian Nice.
01:35 redshirtlinux1 Would you suggest that I move the directory, mount and cp data back on each brick?
01:35 JoeJulian cyberbootje: Glad the redundancy saved you.
01:37 cyberbootje JoeJulian: indeed, but it's still scary
01:37 JoeJulian redshirtlinux1: If I were going to cp the data back, I would mount the volume on that server (after doing as you are inclined and mving the directory), then copy the files onto the client mount rather than the brick directly, assuming those files have changed.
01:37 JoeJulian redshirtlinux1: maybe an rsync --inplace so that you don't waste time copying files that are already current.
01:37 JoeJulian cyberbootje: So how did you recover from R/O?
01:38 redshirtlinux1 k thank you JoeJuilan!
01:38 JoeJulian redshirtlinux1: You're welcome.
01:38 cyberbootje JoeJulian: reboot, fsck, reboot and started gluster, it began healing
01:38 cyberbootje now it's fine
01:38 JoeJulian Well that's why there's nothing in dmesg.... :P
01:39 cyberbootje JoeJulian: well, i looked at dmesg before i rebooted :-)
01:39 JoeJulian I'd probably still run smartctl on the disks just to be sure.
01:40 cyberbootje JoeJulian: it's a hardware raid 10 set, i also checked the raid, it was/is in optimal health
01:41 cyberbootje JoeJulian: it would be very weird if one disk in a raid10 could cause this
01:41 JoeJulian I've seen weirder.
01:42 cyberbootje hmm
01:43 JoeJulian Like the boss telling me to figure out why his and one other user's pptp connections get slow at 9:00 when there's no additional load or traffic and all the other vpn connections show no issues... :(
01:44 cyberbootje JoeJulian: tell him that the vpn does not like him working at 9:00 :-)
01:44 JoeJulian I know I don't.
01:45 cyberbootje JoeJulian: i have seen something similar
01:45 JoeJulian I should just put a time-range on it and boot him off at 8:50.
01:46 cyberbootje JoeJulian: we got a problem with power in the summer, at exactly 21:06 servers were acting weird, even some of them would reboot
01:47 JoeJulian Lol. That would not be fun.
01:47 cyberbootje JoeJulian: turns out, there was something wrong with the power, treet light went on at that exact moment causing a power dip
01:47 cyberbootje street lights*
01:51 aravindavk joined #gluster
01:53 sticky_afk joined #gluster
01:54 stickyboy joined #gluster
01:54 plarsen joined #gluster
01:55 harish_ joined #gluster
01:55 johnmark joined #gluster
01:59 Guest60220 joined #gluster
02:05 vpshastry joined #gluster
02:21 tryggvil joined #gluster
02:32 kdhananjay joined #gluster
02:33 bharata-rao joined #gluster
02:50 rcaskey joined #gluster
02:52 mkzero joined #gluster
03:00 kshlm joined #gluster
03:01 overclk joined #gluster
03:04 kdhananjay joined #gluster
03:14 jporterfield joined #gluster
03:19 psyl0n joined #gluster
03:21 jporterfield joined #gluster
03:28 kdhananjay joined #gluster
03:31 shubhendu joined #gluster
03:45 itisravi joined #gluster
03:45 RameshN joined #gluster
03:46 jporterfield joined #gluster
03:54 ababu joined #gluster
03:56 mkzero joined #gluster
04:00 overclk joined #gluster
04:03 mohankumar__ joined #gluster
04:05 dusmant joined #gluster
04:13 shylesh joined #gluster
04:14 aravindavk joined #gluster
04:16 MiteshShah joined #gluster
04:20 MiteshShah joined #gluster
04:22 jporterfield joined #gluster
04:26 kdhananjay joined #gluster
04:27 RameshN joined #gluster
04:45 kevein joined #gluster
04:48 ndarshan joined #gluster
04:48 ppai joined #gluster
04:55 badone joined #gluster
04:57 glusterbot New news from newglusterbugs: [Bug 1048072] Possible SEGV crash in Gluster NFS while DRC is OFF <https://bugzilla.redhat.co​m/show_bug.cgi?id=1048072>
04:58 RameshN joined #gluster
04:58 zaitcev joined #gluster
05:02 dhyan joined #gluster
05:03 mkzero joined #gluster
05:04 skered- joined #gluster
05:09 mohankumar__ joined #gluster
05:17 CheRi joined #gluster
05:17 vpshastry joined #gluster
05:21 ^rcaskey joined #gluster
05:24 prasanth_ joined #gluster
05:25 davinder joined #gluster
05:43 jporterfield joined #gluster
05:43 kaushal_ joined #gluster
05:44 RameshN joined #gluster
05:46 zwu joined #gluster
05:47 dhyan joined #gluster
05:54 psharma joined #gluster
05:57 glusterbot New news from newglusterbugs: [Bug 969461] RFE: Quota fixes <https://bugzilla.redhat.com/show_bug.cgi?id=969461>
05:59 mohankumar__ joined #gluster
06:06 zwu joined #gluster
06:07 jporterfield joined #gluster
06:09 harish_ joined #gluster
06:18 lalatenduM joined #gluster
06:26 recidive joined #gluster
06:33 ngoswami joined #gluster
06:41 jporterfield joined #gluster
06:43 kaushal_ joined #gluster
06:47 kdhananjay joined #gluster
06:53 zwu joined #gluster
06:57 glusterbot New news from newglusterbugs: [Bug 1048084] io-cache : Warning message "vol-io-cache: page error for page = 0x7ff570559550 & waitq = 0x7ff570055130" in client logs filling up root partition <https://bugzilla.redhat.co​m/show_bug.cgi?id=1048084>
07:02 mkzero joined #gluster
07:09 harish_ joined #gluster
07:11 tor joined #gluster
07:11 vimal joined #gluster
07:12 mkzero joined #gluster
07:13 bala joined #gluster
07:16 ekuric joined #gluster
07:38 kdhananjay joined #gluster
07:38 yk joined #gluster
07:39 shubhendu joined #gluster
07:43 rastar joined #gluster
07:57 ctria joined #gluster
07:59 davinder joined #gluster
08:09 eseyman joined #gluster
08:11 jporterfield joined #gluster
08:29 kdhananjay joined #gluster
08:32 shylesh joined #gluster
08:44 MiteshShah joined #gluster
08:46 7CBAANJOY joined #gluster
08:46 tor joined #gluster
08:51 glusterbot New news from resolvedglusterbugs: [Bug 1029492] AFR: change one file in one brick,prompt "[READ ERRORS]" when open it in the client <https://bugzilla.redhat.co​m/show_bug.cgi?id=1029492>
08:54 saurabh joined #gluster
08:56 rjoseph joined #gluster
08:57 glusterbot New news from newglusterbugs: [Bug 1029482] AFR: cannot get volume status when one node down <https://bugzilla.redhat.co​m/show_bug.cgi?id=1029482>
08:58 Philambdo joined #gluster
09:13 rastar joined #gluster
09:14 bolazzles joined #gluster
09:21 glusterbot New news from resolvedglusterbugs: [Bug 1029496] AFR: lose files in one node, "ls" failed in the client, but open normally <https://bugzilla.redhat.co​m/show_bug.cgi?id=1029496> || [Bug 1029506] AFR: “volume heal newvolume full” recover file -- deleted file not copy from carbon node <https://bugzilla.redhat.co​m/show_bug.cgi?id=1029506>
09:25 ababu joined #gluster
09:27 glusterbot New news from newglusterbugs: [Bug 973183] Network down an up on one brick cause self-healing won't work until glusterd restart <https://bugzilla.redhat.com/show_bug.cgi?id=973183>
09:31 davinder joined #gluster
09:34 ppai joined #gluster
09:41 psharma joined #gluster
09:48 satheesh joined #gluster
09:50 ndarshan joined #gluster
09:50 CheRi joined #gluster
10:07 psharma joined #gluster
10:29 RameshN joined #gluster
10:38 CheRi joined #gluster
10:52 glusterbot New news from resolvedglusterbugs: [Bug 811311] Documentation cleanup <https://bugzilla.redhat.com/show_bug.cgi?id=811311>
11:00 pk joined #gluster
11:05 jag3773 joined #gluster
11:10 diegows joined #gluster
11:18 rastar joined #gluster
11:18 rastar joined #gluster
11:23 lalatenduM joined #gluster
11:24 ndarshan joined #gluster
11:24 psyl0n joined #gluster
11:48 andreask joined #gluster
11:55 jiphex joined #gluster
11:55 kdhananjay joined #gluster
11:56 jporterfield joined #gluster
11:57 RameshN joined #gluster
11:57 kkeithley joined #gluster
11:59 vpshastry joined #gluster
12:01 itisravi_ joined #gluster
12:10 asku joined #gluster
12:16 shubhendu joined #gluster
12:17 andreask joined #gluster
12:18 harish__ joined #gluster
12:22 glusterbot New news from resolvedglusterbugs: [Bug 1020270] Rebalancing volume <https://bugzilla.redhat.co​m/show_bug.cgi?id=1020270>
12:28 glusterbot New news from newglusterbugs: [Bug 1021686] refactor AFR module <https://bugzilla.redhat.co​m/show_bug.cgi?id=1021686> || [Bug 1008839] Certain blocked entry lock info not retained after the lock is granted <https://bugzilla.redhat.co​m/show_bug.cgi?id=1008839> || [Bug 997889] VM filesystem read-only <https://bugzilla.redhat.com/show_bug.cgi?id=997889>
12:38 ira joined #gluster
12:50 tor joined #gluster
12:51 bala joined #gluster
12:58 glusterbot New news from newglusterbugs: [Bug 920434] Crash in index_forget <https://bugzilla.redhat.com/show_bug.cgi?id=920434>
13:18 kdhananjay joined #gluster
13:21 jporterfield joined #gluster
13:26 psyl0n joined #gluster
13:37 harish__ joined #gluster
13:38 bennyturns joined #gluster
13:39 vpshastry left #gluster
13:42 Cenbe joined #gluster
13:43 Cenbe joined #gluster
13:59 asku joined #gluster
14:01 tomased joined #gluster
14:02 Shdwdrgn joined #gluster
14:03 psyl0n joined #gluster
14:09 pk left #gluster
14:18 hagarth1 joined #gluster
14:19 chirino joined #gluster
14:21 dhyan Hi, I have mounted the disk partition as mount /dev/sdb1/ /data/vg/brick1 and created the gluster as "volume create vg replica 2 node1:/data/vg/brick1 node2:/data/vg/brick1
14:22 dhyan I just check the doc and found that "mount /dev/sdb1 /data/vg && mkdir /data/vg/brick" and then create the volume as
14:23 dhyan "volume create vg replica 2 node1:/data/vg/brick1node2:/data/vg/brick"
14:24 dhyan the diff is i mounted the filesystem as /data/vg/brick1 while the doc says it just mount /data/vg and created the directory brick1
14:24 dhyan as of now everyting is works fine.
14:25 dhyan is there any concern that i should consider mount the filesytem as /data/vg and then create the /data/vg/brick1 directory..?
14:26 theron joined #gluster
14:28 Cenbe joined #gluster
14:36 johnmilton joined #gluster
14:37 plarsen joined #gluster
14:43 ndevos dhyan: by using mountpoint /data/vg and path /data/vg/brick1 for the brick, you will not have the / filesystem filled up in case /data/vg/brick1 failed to mount on boot
14:45 ndevos the brick process would need /data/vg/brick1 to be available, if it is missing (not mounted) the brick process will be non-functional and exit
14:46 yk joined #gluster
14:49 jobewan joined #gluster
14:50 vpshastry joined #gluster
14:54 itisravi joined #gluster
14:55 Shdwdrgn joined #gluster
14:58 Guest42919 joined #gluster
14:59 dhyan ndevos: thanks for the tip. I will adopt what the docs says
14:59 ndevos dhyan: you're welcome!
15:00 fyxim joined #gluster
15:00 zapotah joined #gluster
15:01 vpshastry left #gluster
15:06 dbruhn joined #gluster
15:12 chirino joined #gluster
15:15 wushudoin joined #gluster
15:18 mattappe_ joined #gluster
15:40 ira joined #gluster
15:52 LoudNoises joined #gluster
15:57 diegows joined #gluster
16:03 tryggvil joined #gluster
16:08 dbruhn on a directory are the extended attributes supposed to be the same or 0x000000000000000000000000
16:09 aixsyd joined #gluster
16:09 aixsyd Heya guys - so I have a cluster doing a deal. iotop shows that both nodes are Reading a crap ton, and neither are writing. is this expected?
16:10 aixsyd *a heal, not a deal
16:10 aixsyd granted, one of the files is 3.5TB large
16:11 aixsyd gluster volume heal gv0 info shows this: http://fpaste.org/65540/76546113/
16:11 glusterbot Title: #65540 Fedora Project Pastebin (at fpaste.org)
16:15 dhyan joined #gluster
16:15 aixsyd and i assume theres really no way to see a heal's progress, is there?
16:20 dhyan Hi Need a quick help here. trying to replace a brick with the new file path but it failed:  /data/gv_dcms/brick_dcms or a prefix of it is already part of a volume
16:20 glusterbot dhyan: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or​-a-prefix-of-it-is-already-part-of-a-volume/ or see this bug https://bugzilla.redhat.com/show_bug.cgi?id=877522
16:20 dhyan I just try to replace a node with the corrected brick path
16:20 dhyan volume replace-brick gv_dcms gfs-ldc-n185:/data/gv_dcms/brick_dcms gfs-ldc-n190:/data/gv_dcms/brick_dcms force
16:21 dhyan sorry. it should be the follwoing command i'm trying
16:21 dhyan volume replace-brick gv_dcms gfs-ldc-n185:/data/gv_dcms/brick_dcms gfs-ldc-n190:/data/gv_dcms/brick_dcms start
16:22 dhyan the diff is gv_dcms gfs-ldc-n185's mount point it /data/gv_dcms/brick_dcms and the gfs-ldc-n190 mount point is /data/gv_dcms/
16:23 dhyan I just created the directory path /data/gv_dcms/brick_dcms on gfs-ldc-n190 node
16:23 dhyan the node gfs-ldc-n190 is already attached to the gluster
16:23 dhyan gluster> peer status
16:23 dhyan Number of Peers: 2
16:23 dhyan Hostname: gfs-ldc-n185
16:23 dhyan Port: 24007
16:23 dhyan Uuid: fc029ae2-21c0-477a-8590-091f230abd49
16:23 dhyan State: Peer in Cluster (Connected)
16:23 dhyan Hostname: gfs-ldc-n190
16:23 dhyan Port: 24007
16:23 dhyan Uuid: 20a2ebdc-74c5-4608-8c9a-77d998b90c66
16:23 dhyan State: Peer in Cluster (Connected)
16:24 dhyan any tip..?
16:26 recidive joined #gluster
16:36 kmai007 joined #gluster
16:37 kmai007 could somebody tell me how to interpret this "Server and Client lk-version numbers are not same, reopening the fds"
16:37 glusterbot kmai007: This is normal behavior and can safely be ignored.
16:37 kmai007 ty glusterbot
16:38 kmai007 is this message of concern " readv failed (No data available)" ?
16:38 Shdwdrgn joined #gluster
16:41 Philambdo joined #gluster
16:42 dhyan OK i found it :)
16:43 dhyan http://joejulian.name/blog/glusterfs-path-or​-a-prefix-of-it-is-already-part-of-a-volume/
16:43 glusterbot Title: GlusterFS: {path} or a prefix of it is already part of a volume (at joejulian.name)
16:46 eseyman joined #gluster
17:03 premera joined #gluster
17:03 Ritter joined #gluster
17:03 Ritter greetings
17:07 clyons joined #gluster
17:08 Ritter I've got a question about using glusterfs on AWS/EC2 with EBS volumes ... is it at all worth while to have more than one EBS volume on a single EC2 instance?
17:10 _Bryan_ joined #gluster
17:11 JoeJulian semiosis: ^
17:13 JoeJulian Ritter: semiosis is our resident AWS expert. He's usually around. Just hang out for a while, I'm sure he'll pitch in.
17:13 Ritter JoeJulian
17:13 Ritter thanks much
17:17 rotbeard joined #gluster
17:20 Mo__ joined #gluster
17:22 skered- Was 3.4.2 released today?
17:22 semiosis Ritter: yes, use multiple ebs volumes.  the performance of ebs volumes varies greatly
17:23 JoeJulian skered-: yes
17:23 semiosis so distribute files over several of them
17:24 Ritter semiosis: is there any suggested size to break up EBS volumes in to, or how many to use per ec2 instance?
17:24 semiosis all depends on your needs
17:27 Ritter how does glusterfs handle the loss of nodes? either an EBS volume becoming unavailable or the ec2 instance?
17:40 zapotah joined #gluster
17:40 zapotah joined #gluster
17:43 semiosis Ritter: when a server goes down FUSE clients hang for ping-timeout, then disconnect from the bricks on that server & continue with remaining bricks
17:43 semiosis NFS clients connected to that server just get cut off
17:44 semiosis idk how glusterfs handles a single ebs volume going down
17:44 semiosis you should test that & let us know :)
17:45 Ritter so if I have 2 EC2 instances and want the data replicated between the 2 instances (each will be doing read/writes, but no clients)
17:45 semiosis if you can afford it, you should do 3 and use quorum, to avoid a possible split brian situation
17:46 semiosis what kind of workload do you have?
17:46 Ritter not much really, just sync file storage
17:49 Ritter I'm looking at using 1TB EBS per EC2 instance, any thoughts on how I should break that up 512GB/256GB or keep it a single volume?
17:53 semiosis whats the largest file you plan on storing?
17:53 semiosis s/file/file size/
17:53 glusterbot What semiosis meant to say was: whats the largest file size you plan on storing?
17:54 pat__ bbiaf, reboot time.
17:55 dhyan joined #gluster
17:56 B21956 joined #gluster
17:58 PatNarciso joined #gluster
18:00 Ritter semiosis: nothing much more than 20MB per file
18:01 Ritter but lots of them
18:01 aixsyd anyone able to help answer a heal question?
18:02 aixsyd I've had this awful pain in my heal every time I walk. I've had it since I walked across a bunch of hot coals...
18:03 semiosis Ritter: i'd do several small ebs vols then.  that way if you need to add capacity you can expand individual bricks (by replacing ebs vols with larger ones) instead of having to add bricks & rebalance
18:03 semiosis could start as small as 100G
18:04 semiosis depending on your growth rate
18:04 semiosis could start smaller than that even
18:04 Ritter not sure about having 10 ebs volumes on a single ec2 instance
18:05 Ritter that sounds like headache keeping track of them
18:06 semiosis meh
18:06 Ritter lol
18:06 semiosis better than the alternative imho
18:07 Ritter so, 10x 100GB EBS volumes on 1 EC2 instance defines the brick, right?  and I'll have 2 bricks (using 2 EC2 instances)?
18:08 semiosis not necessarily
18:08 semiosis a brick is a directory-on-a-server
18:08 semiosis normally you would format each ebs vol with xfs and mount them, somewhere like /bricks/myvol{0..9}/brick (where myvol is the volume name)
18:09 Ritter I just had a crazy thought ... so I could do a raid10 with ebs volumes, and create a brick from that?
18:09 semiosis alternatively (and i dont recommend it) you could combine those ebs vols using lvm/mdadm, format the combined volume with xfs, and mount it
18:09 semiosis i strongly discourage that
18:09 Ritter let gluster be the redundancy?
18:10 semiosis only reasons imho to stripe ebs vols are 1. you have extremely large files (100s of GB each), and/or 2. you need exceptionally high single-thread throughput
18:11 semiosis although #2 can be handled better with provisioned IOPS
18:12 Ritter I'm not currently planning to use EBS with provisioned IOPS, I just don't think I'm going to need that amount of throughput
18:12 Ritter I'm just trying to create a better fault-taulorant NFS
18:12 Ritter so if one AZ goes down, service isn't lost
18:13 Ritter but I thought it odd that none of the articles I can find on setting up glusterfs on EC2 suggest using more than one EBS volume
18:14 semiosis links?
18:14 semiosis this is the art of capacity planning.  really you should try different configurations & see what works best
18:14 Ritter http://www.gluster.org/category/ec2-en/
18:15 semiosis but your needs are modest, so i think just a few ebs vols per server, mounted directly (without lvm/mdadm) will work best
18:15 Ritter I was also going to ask about ext4 vs XFS, personally I'd rather use XFS
18:17 semiosis xfs is the most heavily tested & widely used backend filesystem for glusterfs
18:17 semiosis afaik
18:17 Ritter good to hear
18:17 semiosis we usually recommend inode size 512 when formatting xfs
18:18 Ritter I saw that somewhere too, I'll make a note of that
18:19 aixsyd semiosis: is it normal for a heal to take a) very long and b) both nodes only reading their disks, not writing at all?
18:19 semiosis also you should know about the CreateImage API call, which will snapshot the entire server, with all attached EBS vols, and make an AMI you can launch to restore the whole server, data & all
18:20 Ritter semiosis: I used that yesterday to replicate my first ec2 instance  ;)
18:20 semiosis aixsyd: yes.  i set the heal alg to full, which performs better for my use case, than the default diff alg
18:20 Ritter I didnt know it'd handle the attached ebs volumes though
18:20 aixsyd semiosis: not familiar - wheres that in the docs?
18:21 semiosis Ritter: you should only use it to replace the same gluster server with a new instance of it.  not to make a different server.  glusterfs servers have a UUID which should not be cloned from one machine to another, fyi
18:21 Ritter ok, thanks
18:21 aixsyd semiosis: nvm, found it
18:21 semiosis aixsyd: :)
18:21 aixsyd semiosis: for a large file like 3.5TB - is full REALLY better/quicker than diff?
18:22 aixsyd these are VM disks
18:22 semiosis then diff is probably what you want
18:22 semiosis guessing
18:22 semiosis my workload is pretty much WORM, so i use full
18:22 aixsyd Is there any way to see a heals progress? heal info is maddeningly unhelpful
18:23 semiosis idk
18:23 aixsyd Oy.
18:27 theron joined #gluster
18:28 daMaestro joined #gluster
18:39 dbruhn So, rebalance in 3.3.1 is there a way to keep your bricks from going AWOL?
18:42 flakrat joined #gluster
18:46 theron joined #gluster
18:54 Philambdo joined #gluster
19:01 Philambdo joined #gluster
19:08 recidive joined #gluster
19:09 aixsyd Gluster read/write performance - two servers with RAID0 will be better than two servers with RAID10, no?
19:10 aixsyd my glusterfs heals are maxing out at about 150MB/s - the pipe is a 10Gb pipe between, so its not hitting that wall.. it must be the RAID?
19:14 dbruhn you could use something like iotop to see what is doing what with your disk
19:15 aixsyd i am, and its showing two glusterd threads each at a max of 50M/s
19:15 aixsyd er, sorry, 60M/s
19:16 dbruhn how many disks per raid? and you've confirmed that you are indeed connected at 10GB on both servers?
19:16 aixsyd 4 disks, 2TB each, and yes, iperf shows a 7.41gb/s
19:16 aixsyd about 8.5GB/10s
19:17 aixsyd the raid stripe is set to 64k.
19:17 aixsyd I believe my PERC6i can go up to 1MB
19:17 dbruhn the heal does run at a lower priority
19:18 aixsyd any way to bump it? or is that it. I know you can bump mdadm's rebuilds by a lot
19:18 JonnyNomad joined #gluster
19:18 dbruhn Honestly I am not sure
19:19 aixsyd The way I'm looking at it - I'm going to be storing large VM disks on this cluster - and in the event of a failure, i wanna see how long a diff heal and a full heal takes - and at 120MB/s, and 4TB of space, it'll take the better part of a day
19:19 aixsyd and if theres any way to bump these speeds, i'm down.
19:20 aixsyd i'm wondering if changing the HW Raid stripe to 512k would help...
19:21 dbruhn Well you can test that independently, and if it provides a performance improvement for the files you will be storing, it should provide a performance improvement under gluster
19:22 aixsyd worth a shot!
19:24 aixsyd I wonder why theres two glusterfsd threads... why not three? XD
19:24 bennyturns aixsyd, not sure if this will do anything but what about tuning  key = "cluster.background-self-heal-count"
19:25 aixsyd Any idea what the default is?
19:26 bennyturns aixsyd, just looking at http://www.gluster.org/community/documentat​ion/index.php/Documenting_the_undocumented
19:26 glusterbot Title: Documenting the undocumented - GlusterDocumentation (at www.gluster.org)
19:26 aixsyd looking there, too
19:26 aixsyd says any number..
19:26 bennyturns I can try on my 10G setup
19:27 aixsyd i set it to 4, no change
19:27 aixsyd unless I have to stop/start the volume to see a change
19:27 aixsyd or stop/start the heal
19:27 bennyturns aixsyd, how are you measuring SH throughput?
19:28 aixsyd SH?
19:28 bennyturns aixsyd, self heal
19:28 aixsyd ah
19:28 aixsyd iotop
19:28 bennyturns kk
19:33 aixsyd the higher the number I put for that key, the slower it goes, oddly.
19:33 aixsyd now im maxing out at 100MB/s
19:34 * bennyturns installed iotop and is creating a real big file
19:34 aixsyd :D
19:40 NeatBasis_ joined #gluster
19:42 aixsyd bennyturns: wow.
19:43 aixsyd http://fpaste.org/65578/77819013/
19:43 glusterbot Title: #65578 Fedora Project Pastebin (at fpaste.org)
19:43 aixsyd Reads at 331MB/s, writes at 706MB/s
19:43 aixsyd so gluster is the bottleneck.
19:44 bennyturns aixsyd, yup I am getting 120 ish MB/sec as well
19:44 aixsyd wtf
19:44 * bennyturns looks
19:44 bennyturns reads are kinda low, I read ~750 and write about 500
19:45 aixsyd in a heal scenario - obviously youre gonna be limited by the slowest read, but at 300ish MB/s, i'm getting 2/3's less
19:45 bennyturns aixsyd, ya I did a 100GB file, heal is gonna take a while
19:45 bennyturns maybe something is throttling?
19:46 aixsyd for both of us?
19:46 bennyturns 120 MB/sec is gigabit speed, hrm
19:46 aixsyd im using 10gig Infiniband
19:46 bennyturns I am on 10G copper
19:47 aixsyd JoeJulian: Know anything about this?
19:47 aixsyd bennyturns: are you using xfs?
19:47 bennyturns yup
19:48 aixsyd is it possible its an xfs issue?
19:48 bennyturns aixsyd, I doubt it, I am thinking something in self heal is throttling the bandwidth?
19:48 aixsyd might be.
19:48 bennyturns lemme ask around here
19:49 recidive joined #gluster
19:53 aixsyd bennyturns: I tried "performance.least-prio-threads" set it to 64, no change. only 2 threads work
19:55 aixsyd performance.enable-least-priority off could be interesting too
19:55 aixsyd try that.
19:56 aixsyd i think thats the issue
19:56 aixsyd i'm getting about 300MB/s write speeds now
19:56 bennyturns aixsyd, cool what was it?  I wanna set it as well
19:56 aixsyd performance.enable-least-priority off
19:56 aixsyd default is on
19:57 bennyturns doh, my heal finished right as I set that :P
19:57 aixsyd aw
19:58 aixsyd okay, HERES an oddity
19:58 * bennyturns creates a new file
19:59 aixsyd I get writes at 300 some MB/s, but i'm getting reads at only 100Mb/s, and the write will write, then drop to 0, wait for 3 secs, then write at 300, wait at 0 for 3 secs, etc
19:59 aixsyd o.O
20:03 aixsyd somethins is very off.
20:04 bennyturns aixsyd, this is what I do for my tuning:
20:04 bennyturns http://rhsummit.files.wordpress.com/2013/07/eng​land_th_0450_rhs_perf_practices-4_neependra.pdf
20:04 bennyturns aixsyd, iirc there is a value that help reads
20:04 bennyturns aixsyd, I have the setup scripted so I never remember everyhting :P
20:05 semiosis aixsyd: whats the rtt between servers?
20:05 aixsyd rtt?
20:05 semiosis round trip time - ping time?
20:05 semiosis latency
20:05 aixsyd sec
20:06 aixsyd icmp_req=3 ttl=64 time=0.075 ms
20:06 bennyturns This is what I was thingking of /sys/block/sdb/queue/read_ahead_kb
20:07 aixsyd set to 128
20:07 bennyturns aixsyd, u using jumbo frames as well?
20:08 aixsyd doesnt really apply for IB
20:08 aixsyd but my MTU is set to...
20:08 aixsyd MTU:65520
20:08 * bennyturns hast played with
20:08 bennyturns IB
20:11 aixsyd this is so bizzare...
20:18 aixsyd no matter what I try, I cannot seem to break the 100MB/s read speed
20:18 bennyturns aixsyd, what do you see when you read on the brick itself?
20:19 aixsyd one sec
20:20 aixsyd cant use dd on a directory
20:20 kmai007 are there docs that go into detail about using WORM ?
20:21 kmai007 @glusterbot WORM
20:21 bennyturns aixsyd, here is what I get http://pastebin.com/9aZLDjuj
20:21 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
20:22 ndk joined #gluster
20:22 aixsyd is that a client mounted to your cluter?
20:22 aixsyd *cluster
20:23 aixsyd I dont have anything else with Infiniband to mount/test =\
20:23 bennyturns aixsyd, ya that is a 10G client connected to 10G servers
20:23 aixsyd bollocks.
20:23 ricky-ticky joined #gluster
20:24 bennyturns things were way faster until I started messing around with things
20:26 aixsyd yeah, i was at 120MB/s, now im at like 80-90
20:26 RedShift joined #gluster
20:27 aixsyd I just did a volume reset and im back to 100-120
20:27 yk joined #gluster
20:30 mattappe_ joined #gluster
20:36 recidive joined #gluster
20:36 dbruhn is "performance.enable-least-priority" supported in 3.3.1, and other than competing with traffic will it help rebalance speeds?
20:36 zaitcev joined #gluster
20:38 mattapperson joined #gluster
20:38 psyl0n joined #gluster
20:43 aixsyd dbruhn: Might
20:44 dbruhn Your answers are as clear as my own! ;)
20:44 aixsyd ;)
20:44 aixsyd i'm learning!
20:44 dbruhn Two weeks ago I added a couple of nodes and ran just a fix layout, it ran all weekend, and then the entire volume decided it was pissed off.
20:45 aixsyd D:
20:45 dbruhn a bunch of bricks went down, and caused all sorts of split-brain tom foolery
20:49 aixsyd i'm so sorry
20:59 dbruhn__ joined #gluster
21:02 diegows joined #gluster
21:05 recidive joined #gluster
21:12 dbruhn__ Meh, it's part of it, just working through the issues and trying to not repeat
21:46 theron joined #gluster
21:48 diegows joined #gluster
21:59 psyl0n joined #gluster
22:14 diegows joined #gluster
22:17 mattappe_ joined #gluster
22:17 MacWinner hi, when updating from gluster 3.4.1 to 3.4.2 using yum repos, can i just update each cluster member independently with yum update?  or do I need to shut everything down first, run the update, then start?
22:20 semiosis usual advice is to upgrade servers before clients, but idk about this particular upgrade
22:21 MacWinner got it. thanks!
22:21 semiosis yw
22:26 Rocky__ joined #gluster
22:32 theron joined #gluster
22:38 Ritter semiosis: "Multiple bricks of a replicate volume are present on the same server. This setup is not optimal." when creating my volume with 10x 100GB EBS volumes
22:38 semiosis yep
22:38 Ritter not optimal?
22:38 semiosis you'll want to give bricks in alternating order, like this...
22:39 semiosis volume create foo replica 2 server1:/a server2:/a server1:/b server2:/b ...
22:39 Ritter ok, thanks
22:39 semiosis where /a & /b are really something like /bricks/foo0/data & /bricks/foo1/data
22:40 semiosis and the ebs/xfs mounts are at /bricks/foo0 & /bricks/foo1
22:40 semiosis make the gluster brick a subdir of the mounted disk fs so that in case the disk isnt mounted then the glusterfs brick dir wont exist, causing error/fail instead of writing to the root mount
22:40 semiosis although i understand there may be other safety checks to prevent that
22:41 Ritter hm,.. interesting
22:42 purpleidea first person to fix this issue gets a mention: https://github.com/pradels/​vagrant-libvirt/issues/111
22:42 glusterbot Title: vagrant-libvirt unsparsifies boxes · Issue #111 · pradels/vagrant-libvirt · GitHub (at github.com)
22:42 purpleidea (one of the last issues i need to work on before release puppet-gluster+vagrant
22:42 purpleidea )
22:43 purpleidea s/release/releasing/
22:43 glusterbot What purpleidea meant to say was: (one of the last issues i need to work on before releasing puppet-gluster+vagrant
22:43 purpleidea )
22:50 Ritter haha  :)  "Number of Bricks: 10 x 2 = 20"
22:50 Ritter its alive
22:52 dbruhn__ congrats!
22:54 Ritter and it works even
22:55 Ritter thanks very much for your help, semiosis
22:55 semiosis yw
22:55 NeatBasis joined #gluster
23:01 dbruhn__ semiosis, is it normal when running a fix-layout rebalance to only see in progress without any other data? everything shows zero's
23:01 semiosis idk whats normal for rebal
23:02 dbruhn__ anyone have an idea? this rebalance has been running for 7 hours now and is showing nothing
23:03 dbruhn__ and the rebalance logs just have this over and over
23:03 dbruhn__ [2014-01-03 23:00:57.452167] I [dht-rebalance.c:1629:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0
23:03 dbruhn__ well that and something saying rebalance is in progress
23:07 primechuck joined #gluster
23:15 diegows joined #gluster
23:21 tryggvil joined #gluster
23:33 diegows joined #gluster
23:41 Ritter left #gluster
23:54 tryggvil joined #gluster
23:57 yk joined #gluster
23:57 diegows joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary