Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-08-12

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:31 purpleidea joined #gluster
01:12 luis_silva joined #gluster
01:19 Javezim @anoopcs Okay checked this morning and there are so many locked files, performance has also decreased a tonne
01:19 Javezim @anoopcs So since we're not using Fuse it seems to be gluster and not the vfs
01:22 Lee1092 joined #gluster
01:39 PaulCuzner joined #gluster
01:46 kovshenin joined #gluster
01:49 kdhananjay joined #gluster
01:59 kovshenin joined #gluster
02:24 Manikandan joined #gluster
02:28 harish_ joined #gluster
03:01 Gambit15 joined #gluster
03:13 om joined #gluster
03:18 sanoj joined #gluster
03:18 rafi joined #gluster
03:30 shubhendu__ joined #gluster
03:35 atinm joined #gluster
03:42 magrawal joined #gluster
03:48 itisravi joined #gluster
03:51 RameshN joined #gluster
03:51 poornimag joined #gluster
04:13 shubhendu__ joined #gluster
04:27 raghug joined #gluster
04:29 sanoj joined #gluster
04:34 nehar joined #gluster
04:51 nehar joined #gluster
05:01 kotreshhr joined #gluster
05:03 ankitraj joined #gluster
05:04 shdeng joined #gluster
05:06 nbalacha joined #gluster
05:07 Gnomethrower joined #gluster
05:08 ppai joined #gluster
05:10 karthik_ joined #gluster
05:11 kotreshhr joined #gluster
05:15 Manikandan joined #gluster
05:17 jiffin joined #gluster
05:18 ira_ joined #gluster
05:20 ndarshan joined #gluster
05:23 nishanth joined #gluster
05:32 mchangir joined #gluster
05:35 karnan joined #gluster
05:37 ramky joined #gluster
05:39 hgowtham joined #gluster
05:40 kovshenin joined #gluster
05:53 jvandewege joined #gluster
05:53 Bhaskarakiran joined #gluster
05:57 kdhananjay joined #gluster
06:06 hackman joined #gluster
06:06 skoduri joined #gluster
06:06 Muthu joined #gluster
06:10 aspandey joined #gluster
06:16 satya4ever joined #gluster
06:16 jtux joined #gluster
06:16 [diablo] joined #gluster
06:17 kshlm joined #gluster
06:25 msvbhat joined #gluster
06:27 masuberu hi
06:27 glusterbot masuberu: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
06:27 masuberu I would like to meassure the IOPs of my storage appliance under different loads scenarios, are IOR and FIO the bests tools for that?
06:31 derjohn_mobi joined #gluster
06:32 shubhendu joined #gluster
06:36 Manikandan joined #gluster
06:41 jiffin masuberu: I remember results of FIO with gluster  gluster-usr/devel ML.
06:41 jiffin Never heard of any IOR results
06:42 masuberu http://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ior/
06:42 glusterbot Title: IOR (at www.nersc.gov)
06:42 jiffin If possible, can u please try both and then share the result across with gluster ML
06:43 masuberu this is the right link https://github.com/LLNL/ior
06:43 glusterbot Title: GitHub - LLNL/ior: Parallel filesystem I/O benchmark (at github.com)
06:43 ankitraj joined #gluster
06:44 ankitraj joined #gluster
06:44 masuberu unfortunately I don't have a gluster environment yet, I want to benchmark my current storage appliance (Panasas) and then see if gluster would outperforms
06:45 jiffin masuberu: K
06:51 atalur joined #gluster
06:52 shortdudey123 joined #gluster
06:56 Bhaskarakiran_ joined #gluster
06:58 Bhaskarakiran joined #gluster
07:02 anil joined #gluster
07:07 rastar joined #gluster
07:07 rafi joined #gluster
07:10 rafi1 joined #gluster
07:19 aspandey joined #gluster
07:25 om joined #gluster
07:25 ramky joined #gluster
07:25 karnan joined #gluster
07:39 fsimonce joined #gluster
07:40 ju5t joined #gluster
07:49 anoopcs joined #gluster
07:51 karthik_ joined #gluster
07:56 arcolife joined #gluster
08:03 LinkRage joined #gluster
08:04 LinkRage Is there a way to use GlusterFS without FUSE ?
08:05 jri joined #gluster
08:05 ndevos LinkRage: yes, NFS, Samba, libgfapi, qemu+gfapi and probably more
08:05 ndevos and, glusterfs-coreutils, of course
08:08 social joined #gluster
08:08 LinkRage ndevos, Am I required to recompile GlusterFS in order not to use FUSE with it. This is what I'm wondering
08:09 ndevos LinkRage: some features on the server need a fuse mountpoint, without fuse, they wont work
08:10 LinkRage I see
08:10 ndevos LinkRage: it is not really required to recompile, I dont think you can disable fuse while compiling anyway
08:10 LinkRage I see
08:11 ndevos qouta is one of those features, maybe geo-replication too, there might be others'
08:12 LinkRage I'm just looking for the best performance that's why I had idea not to use fuse. I had experience with zfs with compression & fuse - poor performance while zfs  in kernel space was much much faster
08:13 LinkRage ndevos, I don't need such fancy things like geo-replication etc. I just need compression enabled and best performance available. nothing more :)
08:14 ndevos LinkRage: fuse is normally on the client, the bricks (server side filesystems) can be on zfs if you like
08:15 LinkRage I see, so I just need to figure it out how to mount it on the client without fuse if possible
08:16 ndevos then you can export it though nfs (with gluster/nfs or ganesha), or samba (and the vfs_gluster module)
08:17 ndevos but fuse on the client has the advantage of included HA capabilities, whereas with nfs or samba you need to make those services HA capable
08:18 ndevos also, fuse on the client can talk to all the bricks at once, distributing the load on the network/servers, nfs/samba has a 1:1 connection to clients
08:18 LinkRage I agree, the whole setup with gluster will be used from time to time on newly provisioned machine (they all are a fail-over/backup only when/if the bare-metal machines are dead)
08:18 LinkRage so no HA needed
08:19 LinkRage ndevos, thank you for the last sentence! this is very important info for me :)
08:19 ghenry joined #gluster
08:19 LinkRage and thank you for everything as well
08:20 Siavash___ joined #gluster
08:20 ndevos LinkRage: glad it helped, let us know here if you have further questions, or send them to gluster-users@gluster.org :)
08:20 devyani7 joined #gluster
08:21 LinkRage ndevos, thank you!
08:21 ghenry left #gluster
08:29 robb_nl joined #gluster
08:34 noobs joined #gluster
08:39 sanoj joined #gluster
08:45 Slashman joined #gluster
08:49 jkroon joined #gluster
08:50 [diablo] joined #gluster
08:52 [diablo] joined #gluster
08:54 jkroon i note yesterday there was  discussion around cluster.op-version ... how do I determine what the newest available op version is for a given installation?  and as to why it's required - i'm guessing this is to determine the lowest common protocol denominator between different daemons?
08:54 ws2k3 joined #gluster
08:56 post-factum jkroon: latest opversion is available in gluster source code
08:56 kxseven joined #gluster
08:57 post-factum jkroon: https://github.com/gluster/glusterfs/blob/master/libglusterfs/src/globals.h#L21
08:57 glusterbot Title: glusterfs/globals.h at master · gluster/glusterfs · GitHub (at github.com)
09:00 jkroon post-factum, thanks.  to confirm - one SHOULD do "gluster volume set all cluster.op-version ???" after all nodes in the cluster has been upgraded?  what's the downside of NOT doing this?  Can this happen automatically perhaps?
09:01 post-factum jkroon: afaik, opversion is not being bumped on upgrade automatically, and that is intentionally
09:01 post-factum jkroon: first of all, all clients must be upgraded before bumping opversion
09:02 post-factum jkroon: second, not bumping opversion will lead to inability to use new features if those are available in new version
09:02 jkroon and those new features may involve certain performance enhancements.
09:04 post-factum jkroon: unlikely
09:10 jkroon post-factum, the reason i ask is we're in a situation where i've got a number of entries marked for split brain, and we were forced (after) to switch off self-heal due to performance begin completely destroyed.  currently running on 3.7.4.
09:10 jkroon on the one cluster i'm now getting "volume set: failed: Another transaction is in progress. Please try again after sometime." when trying to set opversion.
09:11 post-factum jkroon: first of all, starting since 3.7 you should disable cluster.entry-self-heal, cluster.metadata-self-heal and cluster.data-self-heal, as those operations are handled by self-heal daemon
09:12 post-factum jkroon: second, you'd better upgrade cluster to latest 3.7 available
09:12 jkroon i'm trying to figure out how to best configure glusterfs to still have reasonable performance, but relatively low risk of split-brain.  unfortunately in both cases we only have 2-way replicate, so quorum options are of no use.
09:12 jkroon post-factum, upgrade is queued for the weekend.
09:12 jkroon ok, so i've set those cluster options.  shd should still be running?
09:12 post-factum jkroon: regarding transaction in progress. either you should wait for timeout to expire (30 mins?), or restart glusterd on all nodes
09:12 jkroon no, been waiting days.
09:13 jkroon restart of the daemons then ...
09:13 jkroon ok, so your recommendation is cluster.{data,entry,metadata}-self-heal: off, cluster.self-heal-daemon: on?
09:14 rastar joined #gluster
09:16 post-factum jkroon: correct
09:17 bkolden joined #gluster
09:19 ira joined #gluster
09:21 Alghost joined #gluster
09:22 Alghost joined #gluster
09:22 poornimag joined #gluster
09:24 LinkRage ndevos, I have exported a dir (which sits on an ext4 lvm volume) via glusterfs but when I write (touch) a file on the server directory which is exported I don't see the changes (the new file) on the client. If I mount it on another client - then the file is visible.
09:24 jkroon post-factum, if one performs a new installation - are those settings set by default, but they got inherited from an older install?
09:24 hackman joined #gluster
09:24 post-factum jkroon: self-heal daemon is enabled by default
09:24 post-factum jkroon: but those 3 options are enabled as well, afaik.
09:25 jkroon ok, so good practice to make sure they're off and set them explicitly.
09:25 jkroon another thing to go into my sanity checkers then.
09:26 ndevos LinkRage: you should not write to the bricks directly, Gluster can not manage the contents when changes are made behind its back
09:26 LinkRage ndevos, thank you. I understand
09:27 ndevos :)
09:30 jkroon [2016-08-12 09:29:00.030368] E [MSGID: 106029] [glusterd-utils.c:7161:glusterd_check_files_identical] 0-management: stat on file: /var/lib/glusterd//-server.vol failed (No such file or directory) [No such file or directory]
09:30 jkroon that doesn't make much sense.
09:30 jkroon that's trying to run "gluster volume set mail cluster.metadata-self-heal off"
09:31 ndevos jkroon: strange, and the volume called "mail" exists?
09:31 ndevos the path should be /var/lib/glusterd/mail/mail-server.vol, I think
09:33 ndevos hmm, the only -server.vol file that I have is for the nfs-server, I assume you disabled that on the "mail" volume then?
09:33 ndevos well, even if disabled, the nfs-server.vol exists for me
09:34 * ndevos checked on a 3.8.2 environment, fwiw
09:37 kshlm I kind of remember seeing this occur before.
09:37 kshlm Something about the service data structs not being initialized.
09:37 kshlm It has been fixed in the latest builds afaik.
09:39 nbalacha joined #gluster
09:40 jkroon ndevos, nfs should be disabled yes, but i note that on the one server it's not.
09:40 ndevos jkroon: nfs is enabled/disabled per volume, not per server?
09:41 ndevos jkroon: and trust kshlms reply, he definitely knows more about the glusterd stuff than me :)
09:41 jkroon ndevos, it's been a long time since I did that enabling/disabling.  i'll have to google on the volume vs server, but from what I can tell it's per-brick.
09:42 jkroon no wait, can't be per-brick, that would not make sense.
09:42 kshlm I think this was it https://bugzilla.redhat.com/show_bug.cgi?id=1213295, but the bug report talks of a crash.
09:42 glusterbot Bug 1213295: unspecified, unspecified, ---, amukherj, CLOSED CURRENTRELEASE, Glusterd crashed after updating to 3.8 nightly build
09:43 jkroon even more strange, the output from gluster volume status on the two servers are in disagreement about where what is.
09:43 derjohn_mobi joined #gluster
09:43 jkroon http://pastebin.com/zVXiDgVt
09:43 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
09:44 jkroon https://paste.fedoraproject.org/406630/47099505/
09:44 glusterbot Title: #406630 Fedora Project Pastebin (at paste.fedoraproject.org)
09:45 jkroon definitely relates to nfs.  on the other cluster nfs was never disabled (even though we're not using it there was possibility of it).
09:47 kshlm jkroon, That output occurs when the node on which you ran the command hasn't connected to the other nodes.
09:48 kshlm If you check `gluster peer status` on bagheera, uriel should be in a disconnected stated.
09:48 kshlm It happens sometimes.
09:48 kshlm A glusterd restart fixes it in most cases.
09:49 kshlm jkroon, Which version of Gluster are you using?
09:50 kshlm I don't see (yet) any reason for it to occur in 3.8.
09:51 jkroon 3.7.4 at the moment, upgrade to 3.7.13 is queued for the w/e
09:52 jkroon State: Accepted peer request (Connected)
09:52 jkroon on bagheera wrt uriel.
09:53 jkroon glusterd restart on bagheera fixed it.
09:53 jkroon and all NFS servers is now online too - so something was not properly initialized on bagheera.
10:01 kshlm jkroon, Maybe that bug I referenced still exists in 3.7.*. Need to check though.
10:06 azilian joined #gluster
10:07 hgowtham joined #gluster
10:33 ju5t joined #gluster
10:43 msvbhat joined #gluster
10:48 micke joined #gluster
10:55 jith_ joined #gluster
11:01 rastar joined #gluster
11:03 poornimag joined #gluster
11:11 ju5t joined #gluster
11:15 jith_ hi ppai, u clarified myquery regarding gluster-swift concept recently.. i tried the same that is Glusterfs volume mounted as swift disk. But i am getting authorization failed error.. when i bypass the authorization by removing keystoneauth i am getting container server error as follows “ proxy-server: Authorization failed for token
11:15 jith_ proxy-server: Invalid user token - deferring reject downstream
11:15 jith_ proxy-server: ERROR with Container server 10.10.15.151:6001/gluster re: Trying to PUT /AUTH_e5210fccb3854059a021a3a849e59f5e/sample: ConnectionTimeout (0.5s) (txn: txf61cf10bdd794950948f4-0057ad5c2c) (client_ip: 10.10.15.156)
11:15 jith_ : Container PUT returning 503 for (503,) (txn: txf61cf10bdd794950948f4-0057ad5c2c) (client_ip: 10.10.15.156)” is that swift config
11:15 jith_ error?
11:21 post-factum @paste
11:21 glusterbot post-factum: For a simple way to paste output, install netcat (if it's not already) and pipe your output like: | nc termbin.com 9999
11:21 post-factum jith_: ^^
11:25 ppai jith_, can you please share me the "pipeline" line in proxy-server.conf
11:31 jith_ ppai: sure sure
11:31 jith_ ppai: pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk tempurl ratelimit authtoken container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server
11:32 jith_ i wantedly removed keystoneauth, bcaz it is showing authorization failed with keystoneauth
11:32 ppai jith_, remove auth_token and reload proxy server - "swift-init proxy reload"
11:32 ppai jith_, authtoken*
11:35 jith_ ppai:.sure
11:39 shubhendu_ joined #gluster
11:41 jith_ ppai: token error went, but Service unavailable is coming.. when i checked logs, it follows http://paste.openstack.org/show/555957/
11:42 glusterbot Title: Paste #555957 | LodgeIt! (at paste.openstack.org)
11:43 ppai jith_, can you check if container server is running ?
11:51 johnmilton joined #gluster
11:52 jith_ ppai: thanks.. yes its is running
11:54 ppai jith_, please share the way your rings were created. Confirm that container server is listening on port 6001 with netstat -planut | grep 6001
11:54 jkroon /usr/sbin/gluster volume get mail cluster.self-heal-daemon <-- does anybody know a way to make that command output ONLY the value, ie, without the headers and the option name itself?
11:54 glusterbot jkroon: <'s karma is now -23
11:56 jith_ ppai: sure, it is listening i  checked "tcp        0      0 10.10.15.161:6001       0.0.0.0:*               LISTEN      30120/python"
11:58 ppai jith_, in the failure logs you posted I see the IP as 10.10.15.151 but in your output above it's 10.10.15.161
11:58 jith_ ppai: http://paste.openstack.org/show/555965/
11:58 glusterbot Title: Paste #555965 | LodgeIt! (at paste.openstack.org)
12:00 ppai jith_, is this a single node SAIO environment ? I see a different IP for your account ring
12:01 atalur joined #gluster
12:02 jith_ ppai, thanks for noting the ip in container server log
12:03 jith_ ppai, thanks, yes it is not that SAIO, i was installing mitaka version, but using the old commands
12:04 jith_ u found it right.. i have created the container server and object server as ip added 10.10.15.151 instead of 10.10.15.161
12:04 shubhendu__ joined #gluster
12:10 jith_ ppai: swift is single node only, i was trying it for a while with glusterfs backend
12:10 ppai jith_, ok
12:11 ppai jith_, so I understand all 4 PACO processes running on the same node
12:11 jith_ ppai: yes
12:12 jith_ ppai, i can remove the ring file and builder files right??  in order to create the new rings??
12:14 ppai jith_, yes you can, goto /etc/swift and "rm -f *.builder *.ring.gz backups/*.builder backups/*.ring.gz"
12:15 jith_ ppai, yes thanks really
12:32 ju5t joined #gluster
12:32 shubhendu_ joined #gluster
12:34 B21956 joined #gluster
12:37 atinm joined #gluster
12:39 B21956 joined #gluster
12:53 jkroon how can I know if a shd is running for a specific volume?
12:57 shubhendu__ joined #gluster
12:58 rastar joined #gluster
12:59 rwheeler joined #gluster
13:10 Trefex joined #gluster
13:12 shubhendu_ joined #gluster
13:12 dlambrig_ joined #gluster
13:14 unclemarc joined #gluster
13:18 shubhendu_ joined #gluster
13:22 squizzi joined #gluster
13:26 luis_silva joined #gluster
13:30 shubhendu__ joined #gluster
13:40 arcolife joined #gluster
13:43 rastar joined #gluster
13:50 msvbhat joined #gluster
13:52 shubhendu__ joined #gluster
13:58 ramky joined #gluster
14:00 mhulsman joined #gluster
14:00 shyam joined #gluster
14:01 shellclear joined #gluster
14:01 edong23 joined #gluster
14:03 shellclear hi buddies, someone know how do i can set uid and gid in a gluster volume that will be access via nfs
14:03 shellclear ?
14:03 shellclear I've create a volume and mount nfs client side
14:08 shellclear is it Option: server.anonuid Option: server.anongid?
14:09 ndevos shellclear: I'm not sure what you mean?
14:09 bowhunter joined #gluster
14:09 ndevos shellclear: do you have a user on the NFS-client that needs to have read/write access to a directory?
14:10 ndevos shellclear: in that case, create the direcory on the NFS-mountpoint as root, and use 'chown uid /mnt/volume/directory'
14:11 ndevos and chgrp, or however you want to do that
14:11 gnulnx left #gluster
14:12 ndevos shellclear: if you use nfs-ganesha and NFSv4 mounts, you may need to configure /etc/idmapd.conf and make sure the server running NFS-Ganesha can resolve the username (try "id <username>")
14:12 shellclear ndevos:  yes ,i have. What i mean to say it's i need all nfs client side users write in glustervol  with nobody user
14:14 ndevos shellclear: not sure if that can easily be done, maybe there are some mount options you can use on the nfs-client for that (check "man 5 nfs")
14:14 shellclear sorry my english, I'm studying to better it
14:15 skylar joined #gluster
14:15 ndevos no problem, I think we're understanding eachother :)
14:17 Gambit15 joined #gluster
14:17 ndevos shellclear: Gluster/NFS only allows "root squash", meaning the UID=0/GID=0 gets mapped to the anongid (by default UID of nfsnobody)
14:18 ramky joined #gluster
14:18 ndevos shellclear: maybe NFS-Ganesha has more options, and allows mapping of any user to nfsnobody, I'm not sure about it
14:22 om guys
14:22 om listing a gluster mounts takes minutes
14:23 om performance is really bad, max 340 Kbps
14:23 ndevos shellclear: hmm, not sure what happens when you would mount with: mount -t nfs -o sec=none <server>:/<export> /mnt
14:23 om what's the most common reason for that?
14:23 ndevos om: many files in the directory
14:23 om about 200
14:23 om at most
14:24 hagarth om: many sub-directories in a directory is a known problem for listing
14:24 om we had this issue before, but we fixed it with some config changes
14:24 om now it's back
14:24 hagarth om: what is your volume configuration?
14:24 om https://gist.github.com/andrebron/bd2bb6c6ec7f3ba83e827fd74d70b128
14:24 glusterbot Title: gist:bd2bb6c6ec7f3ba83e827fd74d70b128 · GitHub (at gist.github.com)
14:25 hagarth om: how many bricks?
14:25 om 4
14:25 om the brick nodes are connected by 1 Gbps
14:26 om I was told to do this but old gfs client mounts are stopping the config changes:
14:26 om performance.write-behind off
14:26 om cluster.lookup-unhashed on
14:29 om running gluster server v. 3.7.8
14:29 Muthu joined #gluster
14:29 om but old ubuntu precise servers connect with v 3.6.x I think
14:30 om still, all other servers are hit by performance issues even when using client v. 3.7.8
14:36 hagarth om: you might want to profile your gluster volume to observe the latencies being experienced by various calls
14:37 om 'profile' ?
14:38 hagarth https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/
14:38 glusterbot Title: Monitoring Workload - Gluster Docs (at gluster.readthedocs.io)
14:39 hagarth om: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/
14:39 glusterbot Title: Performance Testing - Gluster Docs (at gluster.readthedocs.io)
14:39 hagarth check for client side profiling in the above URL
14:45 om ok, did that
14:45 om what am I looking for in the output?
14:48 om looks like I have 10 files in split brain
14:49 hagarth check the percentage time spent in various calls both on the client and the server
14:49 hagarth s/server/bricks/
14:49 glusterbot What hagarth meant to say was: check the percentage time spent in various calls both on the client and the bricks
14:57 om this shows some issue probably...?
14:57 om https://gist.github.com/andrebron/45b9de496576ebad33320f79220464a9
14:57 glusterbot Title: gist:45b9de496576ebad33320f79220464a9 · GitHub (at gist.github.com)
14:57 raghug joined #gluster
14:58 om latency is in milliseconds I guess?
14:59 hagarth om: yes, looks like opendir is taking around 80ms
15:00 om well, if this is in ms, then it would be 80 seconds right?
15:01 hagarth om: us (microsecond) is the unit of measurement
15:01 om ah ok
15:01 om then this is good then
15:02 wushudoin joined #gluster
15:02 om I already rebooted all the brick nodes.  that didn't help
15:03 om I also cleared the 10 split brain files, didn't help performance either
15:03 om I'm maxing out at 785 Kbps
15:03 om it was 380 Kbps before
15:03 om but this is all under 1 Mbps making this fs extremely slow
15:06 hagarth om: check latencies & throughput all through the stack (disks, network, client mount etc.)
15:06 hagarth that will help in isolating the problem
15:12 mhulsman joined #gluster
15:15 jtux joined #gluster
15:23 ramky joined #gluster
15:28 RameshN joined #gluster
15:45 shyam joined #gluster
15:50 congpine joined #gluster
15:52 congpine Hi, can anyone help please. I have add 2 bricks to replicated volume and run rebalance, but I have to stop it in the morning as I was afraid it will cause performance issue for production
15:52 congpine It is now taking longer to look up for files, may be 5-10s delay
15:52 kpease joined #gluster
15:52 congpine and i'm seeing those logs:
15:52 congpine [2016-08-12 15:49:24.300460] I [dht-layout.c:640:dht_layout_normalize] 0-london-vol01r-dht: found anomalies in /gluster_store. holes=1 overlaps=2
15:53 congpine i don't understand what holes and overlaps mean
15:53 congpine and if I run fix-layout will it slow down the performance ?
15:54 Siavash joined #gluster
16:03 shyam joined #gluster
16:08 glustin joined #gluster
16:19 mhulsman joined #gluster
16:19 JoeJulian congpine: To understand how dht extended attributes are used, see https://joejulian.name/blog/dht-misses-are-expensive/ . fix-layout simply walks the directory tree and changes those xattrs on the directories so they each have an equal hash range.
16:19 glusterbot Title: DHT misses are expensive (at joejulian.name)
16:20 JoeJulian A hole or overlap is where there is a hole or overlap in that range between servers.
16:20 JoeJulian It shouldn't affect performance to a measurable degree. It's not doing all that much work.
16:20 RameshN joined #gluster
16:26 siavash joined #gluster
16:27 shyam joined #gluster
16:42 dlambrig left #gluster
16:55 RameshN joined #gluster
16:59 bb joined #gluster
17:04 bb Had a quick ? Im reading through the "An Introduction to Gluster Architecture" paper and am reading the following information about adding on additional nodes and scaling linearly. Basically it says 1 node writes at 100MB/s up to 8 nodes equaling 800MB/s. My question is, does this mean 800/MBs total throughput for writes for one client to a mount or are we talking 800/MBs of write throughput total across 8 clients connecting to the 8
17:04 bb node cluster writing at 100MB/s each. Hope that makes sense
17:05 RameshN_ joined #gluster
17:05 MikeLupe joined #gluster
17:06 luis_silva1 joined #gluster
17:13 JoeJulian bb: Ok, you're netflix and you need to feed the same popular file to 100k clients. One possible solution is to use replication to ensure you have enough servers that have the file so that each server can be saturated (obviously excluding network requirements and greatly oversimplifying).
17:15 md2k joined #gluster
17:15 JoeJulian bb: Or you're pandora who, through very smart consumer training, doesn't have to feed the same popular file to all their users, but rather spreads the load among 10k similar files. This is where distribution can spread the load across multiple servers where, statistically, each server will receive an equal share of the load.
17:16 sputnik13 joined #gluster
17:17 JoeJulian There's nothing, specifically, in gluster that will limit your data rates. Hardware bottlenecks are usually your only hindrances.
17:19 hagarth joined #gluster
17:22 RameshN_ joined #gluster
17:25 Wojtek I'm having an issue with Gluster performances. It sometimes takes ~30 seconds to save a 10kb file to the volume. Normal <1s. While there was no other activity on the nodes, I did an strace on one of the gluster processes that had a very high cpu usage. Here's a snapshot of a few seconds: https://paste.fedoraproject.org/401439/38964147/raw/. All my heals are disabled: https://paste.fedoraproj
17:25 Wojtek ect.org/401742/47040976/raw/ and I'm really unsure of what is causing Gluster to read the attributes of all of these files for no apparent reason. I've ran Gluster with debug logs for a few moments and here's what I see. https://paste.fedoraproject.org/402058/70426455/raw/ There's some lines for entry self-heal despite it being turned off. Not sure if the memory pool full is bad or not. And
17:25 Wojtek not sure also why it's giving the 'failed to get the gfid from dict'. Nothing jumps out that would explain the behavior
17:26 bb JoeJulian thanks for the information
17:30 JoeJulian Wojtek: Looks like your file is out-of-sync. Since client-self-heal is off, it's probably waiting for something else to heal the file. Since you also have the self-heal daemon turned off, nothing ever will. I'm surprised it doesn't just hang forever.
17:36 md2k joined #gluster
17:39 bb JoeJulian: so currently I have the following configuration, Type: Distributed-Replicate Transport-type: rdma Number of Bricks: 2 x 2 = 4. My dd to a brick on the node writes at 445 MB/s, the writes im seeing on the mount within the client with the same dd is almost halved at 226 MB/s. I dont believe it to be infiniband network related as my tests on the network throughput have yielded exceptionally high throughput. I guess im just trying
17:39 bb to figure out where im going wrong.
17:40 JoeJulian Your client is writing to two replica simultaneously.
17:41 RameshN joined #gluster
17:42 bb Is there any other reading material outside of the gluster documentation you would recommend, perhaps a book on the topic? I want to try and pick up as much useful info as I can
17:44 JoeJulian A lot of people have told me that my blog is worth reading: https://joejulian.name
17:44 glusterbot Title: JoeJulian.name (at joejulian.name)
17:45 Wojtek +1 for JoeJulian's blog :)
17:45 bb lol thanks
17:46 luis_silva joined #gluster
17:46 Wojtek JoeJulian: You mean the heal settings only disable the actual healing actions - but that gluster will still scan for consistency issues by itself, and then just queue the fixes, which will never happen on my cluster. Is it somehow possible to disable the consistency checks? What about the .glusterfs/indices/xattrop folder?
17:50 JoeJulian Wojtek: That sounds scary... I can't imagine ever doing that on a cluster I would write to... I've never seen a way to disable the checks. I'd have to peruse the source to see if it's possible, but I'm pretty sure off the top of my head that it's not.
17:56 plarsen joined #gluster
18:00 pdrakeweb joined #gluster
18:25 cliluw I read that there are plans to deprecate striping in favor of sharding. What's the difference and why is sharding better?
18:27 JoeJulian @stripe
18:27 glusterbot JoeJulian: Please see http://joejulian.name/blog/should-i-use-stripe-on-glusterfs/ about stripe volumes.
18:29 congpine ohh that blog is yours. I've been reading it, very useful. Thanks @JoeJulian. Im having a situation that need your expert advice.  I just added a pair of bricks (50TB) to my distributed-replicate volume. I haven't ran rebalance yet.  However, one of the existing servers were expelled from the cluster and I removed the 4 bricks (80TB) on that to arrange our RAID.  I will need to add those bricks back in and replace the old bricks t
18:30 congpine @Joejulian ^
18:30 JoeJulian IRC has a max line length. Your question was cut off at "replace the old bricks to"
18:31 JoeJulian btw... I would generally discourage 50TB bricks. They take a really long time to heal when they're replaced and leave you vulnerable for far too long imho.
18:32 JoeJulian I recommend keeping your brick sizes as small as you reasonably can and using multiple bricks per server to provide the capacity.
18:32 micke joined #gluster
18:33 congpine Would you recommend to :  1. Run rebalance first to add new server in (50TB), when it finishes, add the 4 bricks to re-sync with other pairs. 2. add 4 bricks back to the cluster, and then run rebalance.  to minimise the impact to performance?
18:34 congpine @JoeJulian ^
18:34 congpine I need to add 50TB in to expand the storage as we are running out of storage
18:34 cliluw JoeJulian: The blog post doesn't mention anything about sharding.
18:35 JoeJulian Depends on my needs, but more than likely I would add them all back in and just do a rebalance...fix-layout rather than a full rebalance.
18:35 RameshN joined #gluster
18:35 JoeJulian The only time I would run a full rebalance is if a brick was close to filling up and it was getting closer because a file on that brick was growing and likely to grow beyond the remaining brick size.
18:36 bowhunter joined #gluster
18:36 JoeJulian cliluw: Yeah, sharding didn't exist when I wrote that. It's relatively new but a much smarter implementation.
18:37 JoeJulian cliluw: Personally, I avoid breaking up files. I like the idea of having a worst-case recovery scenario where the file actually lives on a drive.
18:37 cliluw JoeJulian: I like that too so I prefer distributed over shard/stripe.
18:38 JoeJulian But for future dedup or for elastic hash, sharding provides a smaller chunk to recalc after changes.
18:42 RameshN joined #gluster
18:43 kotreshhr joined #gluster
18:46 micke joined #gluster
18:50 glustin joined #gluster
18:56 plarsen joined #gluster
19:00 RameshN_ joined #gluster
19:01 RameshN__ joined #gluster
19:03 johnmilton joined #gluster
19:03 derjohn_mobi joined #gluster
19:12 bb joined #gluster
19:23 congpine thanks @JoeJulian.
19:23 cloph hmm. what's the proper way to add a brick to a replica 2 volume to have a replica 3 one? If I use gluster volume add-brick volname replica 3 host:/path/to/brick it horribly breaks VMs running from images stored on "volname". Their root disk get r/o and attempts to remount,rw fail..
19:23 cloph Furthermore there doesn't seem to be any network activity syncing files/healing...
19:23 cloph so I must be missing something.
19:24 congpine What would happen if I re-add the lost replica brick while running fix-layout ( please note that the brick has no data and will need to re-sync with its other pair)
19:33 kovshenin joined #gluster
19:38 RameshN_ joined #gluster
19:39 cloph urgh, even when the volume is idle, and adding a brick and telling it to do a heal full, no data is transferred to the brick.
19:39 cloph and heal info shows no files to be healed, neither does statistics heal-count.
19:39 cloph surefire way to create inconsistencies :-/
19:44 Wojtek JoeJulian: Sounds good, thanks. I'm preparing a 3.8 lab to test the new healing throtling mechanism. Perhaps that's the best aproach to my problem :)
19:46 PotatoGim joined #gluster
19:59 shyam joined #gluster
20:13 Slashman joined #gluster
20:18 glustin joined #gluster
20:20 om hagarth: the only performance issue is on the glusterfs client mount, everything else works fast
20:31 ahino joined #gluster
20:51 pramsky joined #gluster
21:20 B21956 joined #gluster
22:24 jkroon joined #gluster
22:24 jkroon hi all, after upgrading glusterfs from 3.7.4 to 3.7.14 and restarting glusterd I note that the running glusterfsd processes are still the old version.
22:27 jkroon what's the sane way of restarting those?
22:27 jkroon (to fix the glusterfs processes I just umount + mount the mount points).  Would be nice if one could mount -o remount /mount/point but it seems the fuse process does't support remount.
22:30 JoeJulian Right, and restarting bricks isn't just restarting the management daemon. Kill a brick and restart it by either restarting glusterd or with "gluster volume start $volname force"
22:30 JoeJulian Make sure self-heals are done before doing that to the replica peer.
22:42 jkroon joined #gluster
22:42 om guys
22:42 om still having performance issues on glusterfs client mount
22:42 om horrible performance
22:42 jkroon JoeJulian, I disconnected after I responded to you - did you get my two replies?
22:43 om doesn't read/write any faster than 700 Kbps
22:43 om glusterfs is the bottle neck somehow
22:43 om checked all connections
22:43 om speeds
22:44 om rebooted all servers and clients
22:44 om stuck on this
22:45 om any ideas?
22:45 om glusterfs profile seems to be fine
22:45 om here are my config options again
22:45 om https://gist.github.com/andrebron/bd2bb6c6ec7f3ba83e827fd74d70b128
22:45 JoeJulian jkroon: I saw that kshlm worked with you last night.
22:45 glusterbot Title: gist:bd2bb6c6ec7f3ba83e827fd74d70b128 · GitHub (at gist.github.com)
22:46 jkroon JoeJulian, this morning actually.
22:46 JoeJulian om: Why are you using an Arcnet network? ;)
22:46 jkroon well, in my time zone anyway :)
22:46 JoeJulian jkroon: Well, last night to me if I'm sleeping. ;)
22:46 jkroon fair enough :p
22:47 JoeJulian kshlm is a better source for truth than me. He actually makes this stuff, I just use it.
22:47 om fat change JoeJulian
22:47 om no arcnet
22:47 JoeJulian om: is heal status clean?
22:48 jkroon yes, he was very helpful.  gave me some really good insights as well.  must say, i was always impressed with glusterfs (and your knowledge/understanding) - got a bit disgruntled recently due to major performance problems but with the help of various people on this # most of those are now sorted out.
22:49 jkroon JoeJulian, heal status = volume heal ${volname} info ?
22:49 jkroon i don't think I've ever seen that 100% clean on any of my systems.
22:50 om JoeJulian: yes
22:50 jkroon info split-brain is usually 0 but typically anything between 100 and 200 entries in the full list.
22:54 jkroon what would be super handy is a command like "gluster volume ${volname} restart-bricks" which restarts the glusterfsd processes.
22:54 JoeJulian jkroon: excellent! I'm glad you got the help you needed. I hate being stuck and having nowhere to turn. That's the reason I'm here.
22:55 om nice!
22:55 JoeJulian Even better would be some way to pause a brick, read the data state, kill the brick, start it again and load it with state and continue.
22:56 JoeJulian I'm toying with Red/Green switching using containers...
22:56 om what's recommended to do for glusterfs 3.7.8 performance issues?
22:56 jkroon that's over my head at this point.
22:57 jkroon om, depends on what's causing your problems.
22:57 om it's not the network or the disks or i/o in general
22:57 om it's glusterfs somehow
22:57 jkroon based on my experience two weeks back, some heal stuff needs to be sorted out.
22:57 om I have no heal issues or split-brains
22:58 om just performance issues
22:58 JoeJulian Oh, .8... turn off cluster.{entry,metadata,data}-self-heal (that's client-side self-heals)
22:58 jkroon specifically we had to set cluster.{data,entry,metadata}-self-heal to off, and leave cluster.self-heal-daemon on.
22:58 JoeJulian Just let the self-heal daemon handle it.
22:58 jkroon hahaha, yes.  exactly that :)
22:58 JoeJulian jkroon: beat you by 1/2 a second. ;)
22:58 JoeJulian Or upgrade to a non-broken version.
22:58 om I didn't config any of that
22:59 jkroon yea yea ... you've been at this for a few years longer than me.
22:59 JoeJulian hehe
22:59 om 3.7.8 is a broken version?
22:59 om https://gist.github.com/andrebron/bd2bb6c6ec7f3ba83e827fd74d70b128
22:59 glusterbot Title: gist:bd2bb6c6ec7f3ba83e827fd74d70b128 · GitHub (at gist.github.com)
22:59 jkroon om, the defaults are all four of those on, the client-side ones should be off.
22:59 jkroon dropped our load averages from 100+ to <10.
22:59 JoeJulian That bit and a few memory leaks - mostly gfapi related iirc.
22:59 om (facepalm)
22:59 om ok
23:00 jkroon imho the three non-daemon ones should default to off.
23:00 JoeJulian The latest version is quite solid.
23:00 om what's the latest 3.7.x ?
23:00 JoeJulian latest 3.7 version.
23:00 jkroon JoeJulian, just did my upgrade from 3.7.4 to 3.7.14 (lastest)
23:00 ehermes joined #gluster
23:00 JoeJulian I'm not quite as confident with 3.8 yet, but getting there.
23:01 JoeJulian I've got an install stuck at 3.6 because this company is too afraid to upgrade the ubuntu distro.
23:01 jkroon yea, i'm holding back on 3.8 too but from the looks of it it's a major improvement.  we have been bitten in the past by being too bleeding edge so we're mostly keeping back on major versions until at least a few point-releases has been made.  esp if there is major arch type changes.
23:02 jkroon based on what i've read at least.
23:04 jkroon JoeJulian, whilst i've got you - i've read your blog on the whole gfid thing a few times now, and from the output of volume heal ${volname} info split-brain I'm still a little baffled as to how to get those resolved gfid ones sorted out in a sensible manner.
23:05 jkroon mostly what I do is I run a find on one of the underlying bricks with (assuming pwd is the brick folder) find . -samefile .gluster/aa/bb/fullgfid ... which then gives me the path.
23:05 hackman joined #gluster
23:05 jkroon I think that assumes it's a file and not a folder ?
23:06 jkroon but with slightly over 5m files on the brick that can take a VERY long time.
23:07 jkroon I'm guessing I can just compare meta-data on the gfid file itself already ... but in case where I randomly shoot one of the two files (in the cases I've had it's mostly log files so both are really wrong0
23:09 JoeJulian except folders are symlinks in the .glusterfs tree so you can't really check the meta-data.
23:09 jkroon but also - in order to rm one of the two, you need all hard links to it, so my current concede rm's both the gfid file and the other file - last time I needed that was with 3.5
23:09 jkroon symlinks to the actual folders?
23:09 jkroon doesn't that mean we can use readlink -f to get to the real path?
23:09 JoeJulian There's really no valid reason for folders to go split-brain. iirc, there's an open bug about them (unless it was fixed recently)
23:10 JoeJulian @split-brain
23:10 glusterbot JoeJulian: To heal split-brains, see https://gluster.readthedocs.org/en/release-3.7.0/Features/heal-info-and-split-brain-resolution/ For additional information, see this older article https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ Also see splitmount https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/
23:11 JoeJulian son-of-a.... WHY do they keep DOING that?!?!
23:11 jkroon folders  that goes into split brain I've only ever seen for two reasons:  1.  permissions differ on the bricks; 2.  time-stamps differ.
23:11 JoeJulian Put in the damned redirect people!
23:12 JoeJulian I've seen it for no reason at all.
23:12 hagarth joined #gluster
23:12 JoeJulian Just there was a trusted.afr attribute on both replica stating the other one was out of date, but they completely matched.
23:12 amye JoeJulian: you mean @glusterbot's response or?
23:12 jkroon theoretically ANY meta data differences can cause it.  i also read something about files being modified could trigger it - but newer versions handles that specific cse.
23:13 jkroon i've got 19 splits at the moment on one cluster, those that are not gfids are all folders.
23:13 JoeJulian amye: docs moving. They get moved but there's no redirect - which I read is something that can be done through some magic file or content so readthedocs will redirect.
23:14 jkroon the split-brain link to the dogs results in 404.
23:14 JoeJulian I fix folder splits by removing all trusted.afr from all directories on the bricks.
23:14 amye JoeJulian: That's why I started a conversation on Docs tooling because you're right, the magic that needs to happen with RTD is not magic :/
23:15 JoeJulian @forget split-brain
23:15 glusterbot JoeJulian: The operation succeeded.
23:16 amye JoeJulian++
23:16 glusterbot amye: JoeJulian's karma is now 29
23:16 JoeJulian @learn split-brain as To heal split brain, rtfm at http://gluster.readthedocs.io/en/latest/Troubleshooting/heal-info-and-split-brain-resolution/ first.
23:16 glusterbot JoeJulian: The operation succeeded.
23:17 jkroon @split-brain
23:17 glusterbot jkroon: To heal split brain, rtfm at http://gluster.readthedocs.io/en/latest/Troubleshooting/heal-info-and-split-brain-resolution/ first.
23:17 JoeJulian @learn split-brain as For versions older than 3.7,  see splitmount https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/
23:17 glusterbot JoeJulian: The operation succeeded.
23:17 jkroon so your blog entry really is historic?  ie, no longer relevant to 3.7 and newer?
23:18 JoeJulian @learn split-brain as For manual brick-level recovery see http://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/
23:18 glusterbot JoeJulian: The operation succeeded.
23:18 JoeJulian @split-brain
23:18 glusterbot JoeJulian: (#1) To heal split brain, rtfm at http://gluster.readthedocs.io/en/latest/Troubleshooting/heal-info-and-split-brain-resolution/ first., or (#2) For versions older than 3.7, see splitmount https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/, or (#3) For manual brick-level recovery see http://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/
23:18 amye Huh. That's not what that should be.
23:18 JoeJulian What?
23:18 amye Oh, no, sorry, that's fine - I was misreading the output
23:18 JoeJulian Ah
23:19 JoeJulian You can have multiple factoids for specific keyphrases.
23:19 jkroon JoeJulian, that first link actually perhaps explains what you're seeing.
23:19 JoeJulian Of course even the manual repair documentation is wrong for directories.
23:19 jkroon the folder goes into split-brain due to file in folder being in gfid split brain.
23:20 JoeJulian Not in any of the cases I've looked at.
23:21 JoeJulian I have seen gfid splits, but that usually looks really ugly, multiple copies of filenames and stuff.
23:21 om I upgraded to 3.7.14 in dev
23:21 JoeJulian +1
23:21 om now it won't connect
23:21 om seems tcp port numbers changed again
23:21 JoeJulian @ports
23:21 glusterbot JoeJulian: glusterd's management port is 24007/tcp (also 24008/tcp if you use rdma). Bricks (glusterfsd) use 49152 & up. All ports must be reachable by both servers and clients. Additionally it will listen on 38465-38468/tcp for NFS. NFS also depends on rpcbind/portmap ports 111 and 2049.
23:21 JoeJulian firewalld is supported now, also.
23:21 om client 3.7.14 is trying to connect to 24007
23:21 om but server is listening to 49152
23:22 JoeJulian That's a brick.
23:22 JoeJulian if 24007 isn't listening then glusterd isn't started.
23:22 om it is running
23:23 om and is listening
23:23 hackman joined #gluster
23:24 JoeJulian Then it's listening on 24007.
23:24 jkroon om - recheck.
23:24 JoeJulian glusterfsd is listening on 49152, not glusterd.
23:24 jkroon JoeJulian, that's interesting.
23:24 om ?
23:25 om glusterfsd on 24007
23:25 JoeJulian glusterfsd on 49152, glusterd on 24007.
23:25 jkroon a useful adition would be to be able to query WHY it things the file is is split brain, eg: volume heal ${volname} info file <gfid:xxxxx>
23:25 JoeJulian @processes
23:25 glusterbot JoeJulian: The GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal).
23:25 JoeJulian jkroon: good idea, file a bug report
23:25 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
23:25 jkroon and that then outputs, the data from each break underneath each other
23:26 om nm
23:26 om was wrong dns in fstab
23:26 om :-O
23:36 jkroon JoeJulian, if the "volume ${volname} split-brain bigger-file /path/to/file" where file is not listed as being in split brain, but is the cause of the folder split brain refuses to heal - i'm guessing it's manual repair time by removing it straight from the brick?
23:40 jkroon is it possible / why would the same file path map to two different gfid values on different bricks?
23:40 plarsen joined #gluster
23:40 JoeJulian I haven't had a chance to try the setfattr method of repairing split-brain on a directory yet.
23:42 JoeJulian Brick A is up (no quorum) and you create directory "foo". Brick "A" goes down and brick "B" comes up. "foo" doesn't exist so you create it. Now it's been created on two different bricks and each has been assigned a new gfid.
23:42 jkroon oh boy.
23:43 jkroon ok, in my case now it's a file, but yes, that's a very viable scenario seeing that I had a junior tech that wreacked havoc on that system in particular.
23:45 jkroon yea, they're all a bunch of courierpop3dsizelist files inside folders that's causing problems.
23:45 jkroon can't heal them because they have different gfid values.
23:45 jkroon and can't rm them via the mountpoint.
23:46 om well, all upgraded to 3.7.14
23:47 om didn't resolve the performance issue
23:47 jkroon om, did you restart glusterd, all glusterfsd instances as well as re-mounted all the clients?
23:47 jkroon can you pastebin the output of gluster volume info?
23:48 om oh wait
23:48 om testing speed now
23:48 om ls is slow
23:49 om but writes are fast now
23:49 om well, faster
23:49 om 11 Mbps
23:50 JoeJulian A'ight, I'm outta here. Have a good weekend.
23:50 om well rsync is slightly better
23:50 om a Mbps
23:50 om 1
23:51 om but curl to the glusterfs system is about 11 Mbps
23:51 om don't understand why that is
23:55 om rsync --progress ../test500.zip ./
23:55 om test500.zip
23:55 om 60,325,888  11%    1.18MB/s    0:06:24
23:55 om fastest is 1.x MB/s on the glusterfs mount
23:56 om def an improvement from 370 Kbps to 700 Kbps
23:56 om but should be much much faster on a 1 Gbps connection

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary