Time |
Nick |
Message |
00:08 |
|
gdubreui joined #gluster |
00:21 |
|
_br_ joined #gluster |
00:21 |
|
jfield joined #gluster |
00:31 |
|
_BryanHm_ joined #gluster |
00:55 |
|
harish joined #gluster |
01:55 |
|
diegows joined #gluster |
02:04 |
|
_polto_ joined #gluster |
02:04 |
_polto_ |
hi |
02:04 |
glusterbot |
_polto_: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer. |
02:05 |
_polto_ |
does geo-replication sync for files in both direction like the standard replication ? |
02:07 |
|
kevein joined #gluster |
02:16 |
|
harish joined #gluster |
02:51 |
|
lalatenduM joined #gluster |
02:59 |
|
vshankar joined #gluster |
03:01 |
JoeJulian |
_polto_: No, geo-replication is unidirectional. |
03:01 |
JoeJulian |
... for now |
03:01 |
|
glusted joined #gluster |
03:02 |
|
zwu joined #gluster |
03:03 |
glusted |
Hello any moderator here? |
03:03 |
glusted |
#gluster-dev |
03:10 |
|
bharata-rao joined #gluster |
03:12 |
JoeJulian |
yes |
03:16 |
|
kshlm joined #gluster |
03:17 |
glusted |
I have Glusterfs version 3.4.0. |
03:17 |
glusted |
1) What is the correct usage of command: gluster volume heal myvolume info heal-failed ? |
03:17 |
glusted |
When I type this command, I get a list of files: |
03:17 |
glusted |
Ex: |
03:17 |
glusted |
2013-11-14 03:07:52 <gfid:fd1d018e-38ae-444c-a069-91528b9871dd>/10.jpg |
03:17 |
glusted |
2013-11-14 03:07:51 <gfid:fd1d018e-38ae-444c-a069-91528b9871dd>/1.jpg |
03:17 |
glusted |
In fact, I get this: |
03:17 |
glusted |
[bob server]# gluster volume heal myvolume info heal-failed | grep -i number |
03:17 |
glusted |
Number of entries: 6 |
03:17 |
glusted |
Number of entries: 68 |
03:17 |
glusted |
So on my 2 bricks, I have a total of 74 "heal-failed" files. |
03:17 |
glusted |
2) When I do gluster volume heal myvolume and/or gluster volume heal myvolume full, then I type again the gluster volume heal myvolume info heal-failed, I get the same number... |
03:17 |
glusted |
In fact it is saying that the command was successful (Launching Heal operation on volume myvolume has been successful Use heal info commands to check status)... |
03:17 |
glusted |
3) How to I remove those files so they don't appear in "heal-failed"? Do I want to remove them? My understanding is that this command should only show the files who have not been healed, not some relics of the past. |
03:18 |
glusted |
woopsie |
03:34 |
|
shubhendu joined #gluster |
03:40 |
|
harish joined #gluster |
03:53 |
glusted |
I have Glusterfs version 3.4.0. |
03:54 |
glusted |
1) What is the correct usage of command: gluster volume heal myvolume info heal-failed ? |
03:54 |
glusted |
When I type this command, I get a list of files: |
03:54 |
glusted |
Ex: |
03:54 |
glusted |
2013-11-14 03:07:52 <gfid:fd1d018e-38ae-444c-a069-91528b9871dd>/10.jpg |
03:54 |
glusted |
2013-11-14 03:07:51 <gfid:fd1d018e-38ae-444c-a069-91528b9871dd>/1.jpg |
03:55 |
|
vpshastry joined #gluster |
03:55 |
glusted |
In fact, I get this: |
03:55 |
glusted |
[bob server]# gluster volume heal myvolume info heal-failed | grep -i number |
03:55 |
glusted |
Number of entries: 6 |
03:55 |
glusted |
Number of entries: 68 |
03:55 |
glusted |
So on my 2 bricks, I have a total of 74 "heal-failed" files. |
03:55 |
glusted |
2) When I do gluster volume heal myvolume and/or gluster volume heal myvolume full, then I type again the gluster volume heal myvolume info heal-failed, I get the same number... |
03:55 |
glusted |
In fact it is saying that the command was successful (Launching Heal operation on volume myvolume has been successful Use heal info commands to check status)... |
03:56 |
glusted |
3) How to I remove those files so they don't appear in "heal-failed"? Do I want to remove them? My understanding is that this command should only show the files who have not been healed, not some relics of the past. |
03:56 |
glusted |
4) About logging which log should I check to know why I have "heal-failed"? |
03:56 |
glusted |
I found the log directory, but I have plenty of logs, including brick1 and brick2 logs. I looked at them, but have not found the root cause, yet |
03:56 |
glusted |
5) I can not find those files marked as "heal-failed", can someone tell me a hint or explanation? (for example what is this: gfid:fd1d018e-38ae-444c-a069-91528b9871dd) |
03:56 |
glusted |
End of questions- |
03:59 |
|
sgowda joined #gluster |
04:02 |
|
shylesh joined #gluster |
04:12 |
|
lpabon joined #gluster |
04:19 |
|
meghanam joined #gluster |
04:19 |
|
meghanam_ joined #gluster |
04:22 |
|
raghug joined #gluster |
04:27 |
|
zwu joined #gluster |
04:28 |
|
mohankumar joined #gluster |
04:36 |
|
RameshN joined #gluster |
04:38 |
|
kanagaraj joined #gluster |
04:39 |
|
davinder joined #gluster |
04:39 |
|
MiteshShah joined #gluster |
04:40 |
|
ababu joined #gluster |
04:47 |
|
ndarshan joined #gluster |
04:50 |
|
ppai joined #gluster |
04:50 |
|
vpshastry joined #gluster |
04:51 |
|
shruti joined #gluster |
04:53 |
|
hagarth joined #gluster |
05:01 |
|
itisravi joined #gluster |
05:08 |
glusted |
Anyone? |
05:13 |
|
kshlm joined #gluster |
05:16 |
|
CheRi joined #gluster |
05:17 |
|
_pol joined #gluster |
05:21 |
|
aravindavk joined #gluster |
05:24 |
|
msolo joined #gluster |
05:34 |
|
bala joined #gluster |
05:40 |
|
raghu joined #gluster |
05:41 |
|
msolo joined #gluster |
05:42 |
|
gmcwhistler joined #gluster |
05:46 |
|
ababu joined #gluster |
05:48 |
|
kaushal_ joined #gluster |
05:51 |
|
raghug joined #gluster |
05:51 |
|
dusmant joined #gluster |
05:59 |
|
ndarshan joined #gluster |
06:02 |
|
msolo joined #gluster |
06:02 |
|
bulde joined #gluster |
06:10 |
glusted |
Anyone? |
06:12 |
|
msolo joined #gluster |
06:14 |
bulde |
glusted: few should be around to answer any queries... |
06:18 |
|
psharma joined #gluster |
06:19 |
|
nshaikh joined #gluster |
06:27 |
|
msolo joined #gluster |
06:35 |
msolo |
my volume seems to be wedged - i tried to run "volume replace-brick" and I am consistently getting this message: "volume replace-brick: failed: Commit failed on localhost. Please check the log file for more details" |
06:35 |
msolo |
The log files look pretty empty |
06:39 |
|
krypto joined #gluster |
06:41 |
|
vpshastry joined #gluster |
06:43 |
|
vimal joined #gluster |
06:47 |
|
vshankar joined #gluster |
06:52 |
_polto_ |
JoeJulian, thanks. Do you know what is the timeframe for "now" ? and whan glusterfs geo-replication will be bi-directional ? |
06:54 |
|
kshlm joined #gluster |
06:58 |
|
ricky-ti1 joined #gluster |
07:03 |
|
aravindavk joined #gluster |
07:04 |
|
raghug joined #gluster |
07:04 |
|
ngoswami joined #gluster |
07:04 |
|
shruti joined #gluster |
07:10 |
|
vshankar joined #gluster |
07:17 |
|
msolo joined #gluster |
07:21 |
|
zeedon2 joined #gluster |
07:21 |
zeedon2 |
having an issue with a GFS replica if anyone has any insight |
07:21 |
zeedon2 |
have an directory which contains quite a lot of files |
07:22 |
zeedon2 |
and in the volume heal info it shows the entire directory listed not just individual files |
07:22 |
zeedon2 |
and currently cannot access anything in there or create any new files |
07:22 |
zeedon2 |
rest of the volume is fine... |
07:23 |
|
jtux joined #gluster |
07:27 |
|
ngoswami joined #gluster |
07:29 |
glusted |
what show the split-brain command for you? |
07:32 |
|
zeed joined #gluster |
07:32 |
zeed |
glusted: nothing is split-brain |
07:32 |
|
davinder joined #gluster |
07:32 |
zeed |
nothing in* |
07:33 |
zeed |
just to add files from that dir do come up in the to be healed list but the whole directory is always there |
07:34 |
glusted |
sorry I dont know then- I notived the "heal.-thing" is strange |
07:34 |
glusted |
see my questions way above- |
07:34 |
glusted |
I do "heal myvolume full" or heal my volume" it changes nothing. |
07:35 |
glusted |
I suspect the "heal -info" always keep the memory of old issues rather than showing what's the current state |
07:35 |
zeed |
heal info shows files that need to be healed right |
07:36 |
glusted |
well my files are always need to be healed then... |
07:36 |
glusted |
well my files are always need to be healed then...do you see them in "heal-failed"? |
07:36 |
glusted |
I did a gluster.... healf-failed | grep -i number |
07:36 |
glusted |
and it shows me the number |
07:37 |
|
msolo joined #gluster |
07:38 |
|
aravindavk joined #gluster |
07:42 |
|
ngoswami joined #gluster |
07:47 |
|
ekuric joined #gluster |
07:50 |
|
jtux joined #gluster |
07:50 |
|
shruti joined #gluster |
07:53 |
|
badone joined #gluster |
07:58 |
|
marcoceppi joined #gluster |
08:00 |
|
ctria joined #gluster |
08:09 |
|
eseyman joined #gluster |
08:19 |
|
raghug joined #gluster |
08:20 |
|
_polto_ joined #gluster |
08:20 |
|
keytab joined #gluster |
08:29 |
|
raghug joined #gluster |
08:37 |
|
raghug joined #gluster |
08:39 |
|
zeedon2 joined #gluster |
08:39 |
|
hybrid512 joined #gluster |
08:47 |
|
vpshastry1 joined #gluster |
08:48 |
|
andreask joined #gluster |
08:51 |
|
hagarth joined #gluster |
08:52 |
|
bigclouds_ joined #gluster |
08:52 |
bigclouds_ |
hi |
08:52 |
glusterbot |
bigclouds_: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer. |
08:53 |
bigclouds_ |
i have mount glusterfs successfully, but when i mkdir , it reported 'No such file or directory' |
08:54 |
bigclouds_ |
glusterfs fs type |
08:54 |
|
raghug joined #gluster |
08:55 |
|
_polto_ joined #gluster |
08:58 |
|
satheesh joined #gluster |
09:01 |
|
satheesh2 joined #gluster |
09:02 |
|
franc joined #gluster |
09:02 |
|
franc joined #gluster |
09:12 |
samppah |
@firewall |
09:12 |
samppah |
@ports |
09:12 |
glusterbot |
samppah: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up for <3.4 and 49152 & up for 3.4. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4. |
09:20 |
|
geewiz joined #gluster |
09:21 |
|
dusmant joined #gluster |
09:22 |
|
shubhendu joined #gluster |
09:22 |
|
kanagaraj joined #gluster |
09:23 |
|
ndarshan joined #gluster |
09:23 |
|
RameshN joined #gluster |
09:24 |
|
vpshastry joined #gluster |
09:25 |
|
aravindavk joined #gluster |
09:28 |
|
bala joined #gluster |
09:29 |
|
gdubreui joined #gluster |
09:35 |
|
t4k3sh1 joined #gluster |
09:36 |
t4k3sh1 |
Dear bluster |
09:36 |
t4k3sh1 |
Dear Gluster |
09:36 |
t4k3sh1 |
only yo ucan help me :( |
09:36 |
t4k3sh1 |
i searched google many times and tried many things but the problem still exists |
09:36 |
t4k3sh1 |
i'm running glusterfs 3.4.1 built on Sep 27 2013 21:32:53 on Debian |
09:37 |
|
T0aD joined #gluster |
09:37 |
t4k3sh1 |
and since power failure i have Input/Output problems with some directories/files |
09:37 |
t4k3sh1 |
i tried rsync with fresh files and so on |
09:38 |
t4k3sh1 |
but when using glusterfs-client, even after fresh rsync on ALL bricks - i see input/output error |
09:38 |
t4k3sh1 |
setup is distributed/replicated |
09:38 |
t4k3sh1 |
Please point me where should i dig? |
09:39 |
t4k3sh1 |
i found some threads about bugs in version 3.3, but mine is 3.4.1 |
09:39 |
t4k3sh1 |
i have normal xattrs like that |
09:39 |
t4k3sh1 |
# file: storage/1/etc/apache2/ |
09:39 |
t4k3sh1 |
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff |
09:39 |
t4k3sh1 |
on all bricks |
09:40 |
t4k3sh1 |
absolutely no idea what to do next :( |
09:42 |
hagarth |
t4k3sh1: looks like a split brain situation |
09:42 |
t4k3sh1 |
plz someone, just give a little hint :( |
09:43 |
t4k3sh1 |
yeah, tried already to sync files manually with stopped glusterfs on all bricks |
09:43 |
t4k3sh1 |
doesn't help :( still see input/output error after glusterfs start |
09:44 |
hagarth |
t4k3sh1: have you checked this - https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ? |
09:44 |
glusterbot |
<http://goo.gl/fDwdMX> (at github.com) |
09:46 |
t4k3sh1 |
hagarth: oh! gimme few mins, will check whole article and check all steps again |
09:47 |
t4k3sh1 |
thank you very much |
09:50 |
|
64MAA0QO0 joined #gluster |
09:58 |
t4k3sh1 |
hagarth: em... it seems something wrong with AMOUNT of trusted.afr values... |
09:58 |
|
muhh joined #gluster |
09:58 |
hagarth |
t4k3sh1: lot of files/directories have wrong values? |
09:58 |
t4k3sh1 |
i have different counts of trusted.afr.volume-client-X for the files |
09:58 |
t4k3sh1 |
sec |
10:01 |
t4k3sh1 |
root n2:/storage/1/etc/apache2# getfattr -d -e hex -m . /storage/3/etc/apache2/ispcp/ |
10:01 |
t4k3sh1 |
getfattr: Removing leading '/' from absolute path names |
10:01 |
t4k3sh1 |
# file: storage/3/etc/apache2/ispcp/ |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-4=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-5=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-6=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-7=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
trusted.gfid=0x7dc9b13e204542fcb8a12944347b67d0 |
10:01 |
t4k3sh1 |
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd |
10:01 |
t4k3sh1 |
root n2:/storage/1/etc/apache2# getfattr -d -e hex -m . /storage/4/etc/apache2/ispcp/ |
10:01 |
t4k3sh1 |
getfattr: Removing leading '/' from absolute path names |
10:01 |
t4k3sh1 |
# file: storage/4/etc/apache2/ispcp/ |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-6=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
trusted.afr.volume-client-7=0x000000000000000000000000 |
10:01 |
t4k3sh1 |
directory ispcp gives input/output error |
10:02 |
t4k3sh1 |
so on brick3 (storage3) i see four trusted.afr attributes (i have 2 nodes, with 4 bricks on each, distributed replicated) |
10:02 |
t4k3sh1 |
on storage 4, i see just two trusted.afr volumes |
10:02 |
t4k3sh1 |
ia that ok? |
10:04 |
|
pk joined #gluster |
10:04 |
hagarth |
t4k3sh1: pk is the split-brain expert :) |
10:05 |
t4k3sh1 |
pk, can you please spend few mins with me? |
10:05 |
|
franc joined #gluster |
10:05 |
t4k3sh1 |
hagarth: i mean, maybe there is no access for that file/dir on some bricks, somehow affected by trusted.afr absence for some clients? |
10:06 |
glusted |
I do "heal myvolume full" or heal my volume" it changes nothing. |
10:06 |
|
kanagaraj joined #gluster |
10:07 |
pk |
t4k3sh1: Looking at the info you provided hagarth.... gimme a min |
10:07 |
hagarth |
t4k3sh1: don't think so, but am curious as to how client-4 and client-5 xattrs got on to the brick in storage/3 |
10:07 |
glusted |
I fixed my SPLIT BRAIN issue on Glusterfs 3.4 by deleting the "split-brain" files on Both bricks... |
10:07 |
glusted |
it "worked" |
10:10 |
|
franc joined #gluster |
10:10 |
pk |
t4k3sh1: directory /storage/4/etc/apache2/ispcp/ does not have gfid and trusted.glusterfs.dht extended attributes.... |
10:12 |
pk |
t4k3sh1: I was just wondering how the directory ended up without those xattrs |
10:12 |
pk |
t4k3sh1: Do you have the logs from the mount which actually gives Input/Output error? |
10:13 |
t4k3sh1 |
just a sec |
10:13 |
|
ndarshan joined #gluster |
10:13 |
t4k3sh1 |
pk, sorry but why "does not have" it have |
10:14 |
t4k3sh1 |
root n2:/storage/1/etc/apache2# getfattr -d -e hex -m . /storage/4/etc/apache2/ispcp/ |
10:14 |
t4k3sh1 |
getfattr: Removing leading '/' from absolute path names |
10:14 |
t4k3sh1 |
# file: storage/4/etc/apache2/ispcp/ |
10:14 |
t4k3sh1 |
trusted.afr.volume-client-6=0x000000000000000000000000 |
10:14 |
t4k3sh1 |
trusted.afr.volume-client-7=0x000000000000000000000000 |
10:14 |
t4k3sh1 |
trusted.gfid=0x7dc9b13e204542fcb8a12944347b67d0 |
10:14 |
t4k3sh1 |
trusted.glusterfs.dht=0x00000001000000003fffffff7ffffffd |
10:14 |
|
aravindavk joined #gluster |
10:15 |
pk |
t4k3sh1: how many nodes do you have? |
10:15 |
t4k3sh1 |
the only difference is that same did on "storage 3" has FOUR trusted.afr.volume... attributes, while on storage 4 only two |
10:16 |
pk |
t4k3sh1: It doesnot matter |
10:16 |
t4k3sh1 |
i have 2 nodes with 4 bricks on each |
10:16 |
t4k3sh1 |
distributed replicated setup |
10:16 |
|
bala joined #gluster |
10:16 |
t4k3sh1 |
problem appears after power faluire |
10:16 |
pk |
t4k3sh1: cool, so 4x2 setup |
10:16 |
t4k3sh1 |
there was a fire in datacenter, so... |
10:16 |
t4k3sh1 |
pk right 4x2 |
10:17 |
pk |
t4k3sh1: got it t4k3sh1, could you give the output of getfattr -d -m. -e hex <brick-dir[1-8]>/etc/apache2/ispcp/ |
10:17 |
pk |
I mean for all the 8 bricks lets get the outputs |
10:17 |
t4k3sh1 |
sure, sec |
10:17 |
|
dusmant joined #gluster |
10:17 |
t4k3sh1 |
im pm? |
10:17 |
pk |
t4k3sh1: pm |
10:18 |
pk |
t4k3sh1: Also get the logs which say Input/Output error on the mount |
10:20 |
|
X3NQ joined #gluster |
10:20 |
|
shubhendu joined #gluster |
10:21 |
|
RameshN joined #gluster |
10:23 |
cyberbootje |
Hi, i'm having some issue's with gluster storage. Setup is replica 2 and something went wrong with the netwerk backbone and now brick2 is detatched from the network |
10:24 |
glusted |
Would it help you to do this: gluster volume statedump yourtvolume? |
10:24 |
|
t4k3sh1_ joined #gluster |
10:24 |
pk |
t4k3sh1_: hi |
10:24 |
pk |
t4k3sh1_: please paste the getfattr output in pm |
10:25 |
cyberbootje |
you can guess what i want to do, reconnect brick2, but it has to be 100% safe, we are talking about 1.3TB data so what is best? delete the files on brick 2 before i reconnect? |
10:26 |
cyberbootje |
backbone is 10Gbit p/sec |
10:28 |
|
dneary joined #gluster |
10:34 |
|
StarBeast joined #gluster |
10:36 |
glusted |
gluster volume status yourvolume detail |
10:36 |
|
Norky joined #gluster |
10:42 |
|
MarkR joined #gluster |
10:42 |
MarkR |
On both my bricks, I get every 10 minutes in my logs: |
10:42 |
MarkR |
E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-GLUSTER-SHARE-replicate-0: Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] |
10:42 |
MarkR |
I removed (and recreated) .glusterfs/00/00/00000000-0000-0000-0000-000000000001 to no avail. What's going on?? |
10:45 |
glusted |
gluster volume heal yourvolume info split-brain |
10:45 |
glusted |
Or gluster volume heal yourvolume info split-brain ¦ grep -i number |
10:48 |
pk |
MarkR: Could you give the getfattr -d -m. -e hex output of the bricks please |
10:48 |
MarkR |
getfattr -d -m. -e hex /data/export-home-1 |
10:48 |
MarkR |
getfattr: Removing leading '/' from absolute path names |
10:49 |
MarkR |
# file: data/export-home-1 |
10:49 |
MarkR |
trusted.afr.GLUSTER-HOME-client-0=0x000000000000000000000000 |
10:49 |
MarkR |
trusted.afr.GLUSTER-HOME-client-1=0x000000000000000200000000 |
10:49 |
MarkR |
trusted.gfid=0x00000000000000000000000000000001 |
10:49 |
MarkR |
trusted.glusterfs.dht=0x000000010000000000000000ffffffff |
10:49 |
MarkR |
trusted.glusterfs.quota.dirty=0x3000 |
10:49 |
MarkR |
trusted.glusterfs.quota.size=0x0000000000034800 |
10:49 |
MarkR |
trusted.glusterfs.volume-id=0x3ff12b3d51a141f7912c453b33adb6fd |
10:49 |
pk |
what about the other one.... |
10:49 |
pk |
MarkR: Please give output from the other brick as well |
10:49 |
MarkR |
getfattr -d -m. -e hex /data/export-home-2 |
10:49 |
MarkR |
[sudo] password for markr: |
10:49 |
MarkR |
getfattr: Removing leading '/' from absolute path names |
10:49 |
MarkR |
# file: data/export-home-2 |
10:49 |
MarkR |
trusted.afr.GLUSTER-HOME-client-0=0x000000000000000200000000 |
10:50 |
MarkR |
trusted.afr.GLUSTER-HOME-client-1=0x000000000000000000000000 |
10:50 |
MarkR |
trusted.gfid=0x00000000000000000000000000000001 |
10:50 |
MarkR |
trusted.glusterfs.dht=0x000000010000000000000000ffffffff |
10:50 |
MarkR |
trusted.glusterfs.quota.dirty=0x3000 |
10:50 |
MarkR |
trusted.glusterfs.quota.size=0x0000000000034a00 |
10:50 |
MarkR |
trusted.glusterfs.volume-id=0x3ff12b3d51a141f7912c453b33adb6fd |
10:50 |
pk |
MarkR: Seems like a metadata split-brain... don't panic |
10:50 |
MarkR |
:) |
10:50 |
pk |
MarkR: Could you give stat output of both the bricks |
10:50 |
_polto_ |
is it possible to switch between sync and async (geo) replication ? |
10:51 |
MarkR |
# gluster volume status |
10:51 |
MarkR |
Status of volume: GLUSTER-HOME |
10:51 |
MarkR |
Gluster processPortOnlinePid |
10:51 |
MarkR |
------------------------------------------------------------------------------ |
10:51 |
MarkR |
Brick file1:/data/export-home-149153Y6101 |
10:51 |
MarkR |
Brick file2:/data/export-home-249153Y4015 |
10:51 |
MarkR |
Self-heal Daemon on localhostN/AY28025 |
10:51 |
MarkR |
Self-heal Daemon on file1N/AY19849 |
10:51 |
MarkR |
|
10:51 |
bulde |
_polto_: geo-replication is done between cluster to cluster, and the replication (the one choosen during volume creation) is done internally within the cluster |
10:51 |
MarkR |
There are no active volume tasks |
10:51 |
MarkR |
Status of volume: GLUSTER-SHARE |
10:51 |
MarkR |
Gluster processPortOnlinePid |
10:51 |
MarkR |
------------------------------------------------------------------------------ |
10:52 |
MarkR |
Brick file1:/data/export-share-149152Y6106 |
10:52 |
MarkR |
Brick file2:/data/export-share-249152Y4023 |
10:52 |
MarkR |
Self-heal Daemon on localhostN/AY28025 |
10:52 |
MarkR |
Self-heal Daemon on file1N/AY19849 |
10:52 |
MarkR |
|
10:52 |
MarkR |
There are no active volume tasks |
10:52 |
|
harish joined #gluster |
10:52 |
_polto_ |
I have a remote server that normally act as a backup server, but sometimes I am more near to that remote server and would like to use it as the main server. |
10:52 |
_polto_ |
bulde, oh ... :( |
10:53 |
_polto_ |
bulde, so it's not possible to switch between ? |
10:53 |
pk |
MarkR: Sorry, I was looking for 'stat <brick-dir-path>' output |
10:53 |
pk |
MarkR: Not the 'volume status' output |
10:53 |
_polto_ |
I need async (becaucse of the slow ADSL connection) but bi-directionnal. |
10:53 |
bulde |
_polto_: as far as I know, no... they are not designed to be switchable |
10:54 |
_polto_ |
any smart ways to do that ? |
10:54 |
MarkR |
# stat /data/export-home-1 |
10:54 |
MarkR |
File: `/data/export-home-1' |
10:54 |
MarkR |
Size: 4096 Blocks: 16 IO Block: 4096 directory |
10:54 |
MarkR |
Device: ca01h/51713dInode: 1447203 Links: 11 |
10:54 |
MarkR |
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) |
10:54 |
MarkR |
Access: 2013-11-23 22:38:31.279290000 +0100 |
10:54 |
MarkR |
Modify: 2013-11-23 23:13:38.677966701 +0100 |
10:54 |
MarkR |
Change: 2013-11-25 11:49:02.667240374 +0100 |
10:54 |
MarkR |
Birth: - |
10:54 |
MarkR |
# stat /data/export-home-2 |
10:54 |
MarkR |
File: `/data/export-home-2' |
10:54 |
MarkR |
Size: 4096 Blocks: 16 IO Block: 4096 directory |
10:54 |
MarkR |
Device: ca01h/51713dInode: 525262 Links: 11 |
10:54 |
MarkR |
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) |
10:54 |
MarkR |
Access: 2013-11-23 22:38:31.283210000 +0100 |
10:54 |
MarkR |
Modify: 2013-11-23 23:13:38.673338576 +0100 |
10:54 |
MarkR |
Change: 2013-11-25 11:49:02.666092633 +0100 |
10:54 |
MarkR |
Birth: - |
10:55 |
bulde |
_polto_: for that you have to wait for 'master <-> master' geo-replication (planned here : http://www.gluster.org/community/documentation/index.php/Arch/Change_Logging_Translator_Design#Consumers 1st point)... |
10:55 |
glusterbot |
<http://goo.gl/eqWajw> (at www.gluster.org) |
10:55 |
bulde |
_polto_: its not in yet |
10:58 |
|
gdubreui joined #gluster |
10:58 |
gdubreui |
derekh, ping |
10:59 |
MarkR |
I upgraded from 3.3 to 3.4. Had some difficulties and decided to recreate the volumes from scratch. Then I mv /data/export-home-old/* /data/export-home-1/ (skipping .glusterfs). This might caused the split brain. |
11:02 |
|
lalatenduM joined #gluster |
11:09 |
|
glusted left #gluster |
11:09 |
pk |
MarkR: Probably, but it seems fine just do the following |
11:10 |
pk |
MarkR: setfattr -n trusted.afr.GLUSTER-HOME-client-0 -v 0x000000000000000000000000 data/export-home-2 |
11:12 |
pk |
MarkR: After that things should be fine |
11:13 |
MarkR |
pk: Now on both bricks I get all zero's trusted.afr..* - looks fine! |
11:13 |
pk |
MarkR: And thats the end of it :-) |
11:14 |
_polto_ |
bulde, thanks |
11:18 |
MarkR |
pk: absolutely. I just got a split brain error for the other volume - now I know how to fix that. pk, I owe you! 8) |
11:23 |
pk |
MarkR: You should be careful though.... If you do it wrong, data will be deleted. Here you have the simplest of split-brains and your stat and getfattr ouput looked almost same, so we just needed set one of them to all zeros |
11:23 |
|
bigclouds joined #gluster |
11:31 |
|
lpabon joined #gluster |
11:31 |
|
bigclouds joined #gluster |
11:34 |
|
franc joined #gluster |
11:34 |
|
franc joined #gluster |
11:35 |
|
kanagaraj joined #gluster |
11:40 |
|
bala joined #gluster |
11:42 |
|
bigclouds joined #gluster |
11:48 |
|
shubhendu joined #gluster |
11:52 |
|
dusmant joined #gluster |
11:52 |
|
bigclouds joined #gluster |
11:52 |
|
ndarshan joined #gluster |
11:53 |
|
RameshN joined #gluster |
11:54 |
|
aravindavk joined #gluster |
12:03 |
MarkR |
Does anyone know a decent Nagios plugin to monitor the gluster log for split brain errors and such? |
12:15 |
|
bigclouds joined #gluster |
12:17 |
|
calum_ joined #gluster |
12:17 |
|
ipvelez joined #gluster |
12:19 |
|
CheRi joined #gluster |
12:21 |
|
ppai joined #gluster |
12:26 |
samppah |
@nagios |
12:27 |
samppah |
semiosis: did you have solution for monitoring gluster logs with nagios? |
12:29 |
|
shubhendu joined #gluster |
12:29 |
|
aravindavk joined #gluster |
12:30 |
|
ndarshan joined #gluster |
12:35 |
|
_polto_ joined #gluster |
12:36 |
|
bala joined #gluster |
12:42 |
|
keytab joined #gluster |
12:47 |
JoeJulian |
@later tell glusted "2013-11-14 03:07:52 <gfid:fd1d018e-38ae-444c-a069-91528b9871dd>/10.jpg" says that on the 14th at 3am UTC there was a failed heal on that file in the directory referenced by that gfid. To find out why it failed, you'd need to check the glustershd.log files. To clear those logs, restart glusterd. |
12:47 |
glusterbot |
JoeJulian: The operation succeeded. |
12:48 |
JoeJulian |
_polto_: 3.7 last I heard for bidirectional geo-sync. |
12:51 |
|
hagarth joined #gluster |
12:51 |
|
dusmant joined #gluster |
12:52 |
|
kanagaraj joined #gluster |
12:53 |
|
rcheleguini joined #gluster |
12:54 |
_polto_ |
JoeJulian, 3.7 ? now we are at 3.4 .. :/ |
12:55 |
JoeJulian |
_polto_: If you use a gluster volume at both ends, and geosync from "local" to "remote" you can stop geosync on the "remote" when you physically change locations and star geosync on "local". |
12:56 |
|
RameshN joined #gluster |
12:56 |
MarkR |
The best Nagios plugin I found is http://exchange.nagios.org/directory/Plugins/System-Metrics/File-System/Check_Gluster/details. Parse gluster volume info. So that's a start. |
12:56 |
glusterbot |
<http://goo.gl/q5A8ne> (at exchange.nagios.org) |
12:57 |
JoeJulian |
samppah: You can see what semiosis does in his ,,(puppet) module. |
12:57 |
glusterbot |
samppah: (#1) https://github.com/purpleidea/puppet-gluster, or (#2) semiosis' unmaintained puppet module: https://github.com/semiosis/puppet-gluster |
12:59 |
_polto_ |
JoeJulian, I have two servers running samba service. One is on ADSL where I work usually on the local server and the other is in a datacenter with 1Gbps internet. If I am travelling I would like to use the one in the datacenter. |
12:59 |
kkeithley |
3.5 will be out soon. On a six month release cadence 3.7 is a little over a year away |
13:00 |
JoeJulian |
assuming the prerequisites make their targeted releases... |
13:01 |
_polto_ |
JoeJulian, and some people are still in the office and are using the LAN server. |
13:02 |
|
LoudNoises joined #gluster |
13:02 |
_polto_ |
I hoped glusterfs to be ideal solution for us. but seems the features we need are not yet implemented. |
13:03 |
JoeJulian |
_polto_: Just put everyone on terminal server in the CoLo. Problem solved. ;) |
13:04 |
|
diegows joined #gluster |
13:05 |
_polto_ |
JoeJulian, but our line is to slow for that. |
13:05 |
_polto_ |
2Mbps |
13:05 |
_polto_ |
ADSL |
13:06 |
JoeJulian |
How many users? |
13:06 |
_polto_ |
4 |
13:07 |
_polto_ |
and we have DXF and other big files (10-400MB) to transfer.. |
13:08 |
JoeJulian |
Yeah, I'd certainly consider testing and RDP solution. |
13:08 |
JoeJulian |
a/and/an/ |
13:09 |
_polto_ |
RDP ? |
13:09 |
JoeJulian |
... too early to type apparently... |
13:09 |
JoeJulian |
@lucky RDP |
13:09 |
glusterbot |
JoeJulian: http://goo.gl/MiTj |
13:10 |
_polto_ |
JoeJulian, .. oh.. not an option for us. |
13:19 |
|
dusmant joined #gluster |
13:25 |
|
vpshastry joined #gluster |
13:28 |
|
bennyturns joined #gluster |
13:35 |
|
RameshN joined #gluster |
13:38 |
|
hybrid5121 joined #gluster |
13:39 |
|
pk left #gluster |
13:41 |
wica |
Someone has glusterfs support in Openstack Havana on Ubuntu. 12.04 LTS https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1224517 |
13:41 |
glusterbot |
<http://goo.gl/b3P56P> (at bugs.launchpad.net) |
13:41 |
|
_polto_ joined #gluster |
13:41 |
|
_polto_ joined #gluster |
13:44 |
|
abyss joined #gluster |
13:57 |
|
davidbierce joined #gluster |
13:59 |
|
japuzzo joined #gluster |
14:09 |
|
ctria joined #gluster |
14:13 |
|
nhm joined #gluster |
14:16 |
|
plarsen joined #gluster |
14:22 |
|
MarkR joined #gluster |
14:23 |
|
B21956 joined #gluster |
14:27 |
|
vpshastry left #gluster |
14:27 |
social |
/o\ |
14:28 |
|
dbruhn joined #gluster |
14:30 |
samppah |
JoeJulian: thanks |
14:31 |
samppah |
MarkR: did you notice? https://github.com/semiosis/puppet-gluster/blob/master/gluster/manifests/server.pp |
14:31 |
glusterbot |
<http://goo.gl/4Vn599> (at github.com) |
14:31 |
samppah |
not sure if this helps with nagios |
14:31 |
samppah |
check_log -F /var/log/glusterfs/etc-glusterfs-glusterd.vol.log -O /dev/null -q ' E ' : is looking r E(rrors) in log file |
14:32 |
MarkR |
samppah: nice, looks promising. |
14:32 |
samppah |
actually that's for controlling nagios check so it should help :) |
14:46 |
wica |
That puppte module I'm going to use to test a setup :0 |
14:46 |
wica |
s/puppte/puppet/ |
14:47 |
glusterbot |
wica: Error: I couldn't find a message matching that criteria in my history of 1000 messages. |
14:48 |
social |
a2: https://bugzilla.redhat.com/show_bug.cgi?id=1033576 issue seems to be fixed in master but from quick look on commits I don't know what fixed it |
14:48 |
glusterbot |
<http://goo.gl/tW3gtb> (at bugzilla.redhat.com) |
14:48 |
glusterbot |
Bug 1033576: unspecified, high, ---, sgowda, NEW , rm: cannot remove Directory not empty on path that should be clean already |
14:49 |
|
khushildep joined #gluster |
14:53 |
|
dneary joined #gluster |
14:54 |
|
msolo joined #gluster |
14:54 |
|
gmcwhistler joined #gluster |
14:54 |
glusterbot |
New news from newglusterbugs: [Bug 1033576] rm: cannot remove Directory not empty on path that should be clean already <http://goo.gl/tW3gtb> |
14:57 |
|
_polto_ joined #gluster |
14:57 |
|
_polto_ joined #gluster |
14:58 |
|
andreask joined #gluster |
15:05 |
|
dusmant joined #gluster |
15:06 |
glusterbot |
New news from resolvedglusterbugs: [Bug 968432] Running glusterfs + hadoop in production platforms with reasonable privileges idioms <http://goo.gl/DhgEP4> |
15:16 |
|
bugs_ joined #gluster |
15:17 |
|
ira joined #gluster |
15:18 |
|
ira joined #gluster |
15:18 |
|
y4m4_ joined #gluster |
15:18 |
|
_BryanHm_ joined #gluster |
15:19 |
|
wushudoin joined #gluster |
15:20 |
|
jskinner_ joined #gluster |
15:23 |
|
y4m4_ joined #gluster |
15:27 |
|
spechal_ joined #gluster |
15:27 |
spechal_ |
Just out of curiosity, can you mix file system types in your volumes? |
15:27 |
spechal_ |
i.e. xfs and ext4 |
15:27 |
|
rwheeler joined #gluster |
15:28 |
ndevos |
sure you can, but I'm not sure why you would want to do that |
15:29 |
spechal_ |
one of our more tenured sys admins has had bad experiences with ifs and would prefer if one of our replicas were ext4 in case the lights go out in the DC |
15:31 |
|
_polto_ joined #gluster |
15:31 |
|
_polto_ joined #gluster |
15:33 |
|
neofob joined #gluster |
15:34 |
|
bulde joined #gluster |
15:34 |
social |
spechal_: quite OT but are you really running irc session as root? |
15:35 |
spechal_ |
Yes, in a throw away VM |
15:36 |
spechal_ |
Testing Mavericks |
15:36 |
spechal_ |
Why? |
15:37 |
social |
nothing |
15:37 |
|
y4m4__ joined #gluster |
15:37 |
spechal_ |
Just rubs you the wrong way? ;) |
15:37 |
wica |
On the most channels you get kickt, when you are using the root account |
15:37 |
spechal_ |
hmm, haven't run across it yet, but thanks for the heads up in case it happens |
15:38 |
wica |
:) |
15:38 |
|
msciciel joined #gluster |
15:40 |
|
andreask joined #gluster |
15:43 |
|
msolo joined #gluster |
15:47 |
|
ndk joined #gluster |
15:48 |
|
eseyman joined #gluster |
16:00 |
|
chirino joined #gluster |
16:02 |
|
msolo joined #gluster |
16:16 |
|
jag3773 joined #gluster |
16:17 |
|
[o__o] left #gluster |
16:19 |
|
[o__o] joined #gluster |
16:24 |
|
sprachgenerator joined #gluster |
16:25 |
|
hagarth joined #gluster |
16:51 |
|
tqrst joined #gluster |
16:51 |
|
bigclouds joined #gluster |
16:56 |
|
sroy_ joined #gluster |
17:09 |
|
msolo joined #gluster |
17:11 |
|
msolo left #gluster |
17:14 |
|
aliguori joined #gluster |
17:20 |
|
jbd1 joined #gluster |
17:24 |
glusterbot |
New news from newglusterbugs: [Bug 1033093] listStatus test failure reveals we need to upgrade tests and then : Fork code OR focus on MR2 <http://goo.gl/iIw5ci> |
17:31 |
|
zaitcev joined #gluster |
17:32 |
|
Mo__ joined #gluster |
17:34 |
semiosis |
:O |
17:35 |
samppah |
:O |
17:35 |
hagarth |
:O |
17:43 |
|
Technicool joined #gluster |
18:00 |
dbruhn |
Anyone using yum-versionlock to keep their gluster systems from upgrading? |
18:11 |
|
dneary_ joined #gluster |
18:15 |
|
wushudoin| joined #gluster |
18:26 |
|
bigclouds_ joined #gluster |
18:27 |
|
rotbeard joined #gluster |
18:28 |
|
_polto_ joined #gluster |
18:28 |
|
_polto_ joined #gluster |
18:47 |
|
plarsen joined #gluster |
18:54 |
|
_BryanHm_ joined #gluster |
18:55 |
glusterbot |
New news from newglusterbugs: [Bug 1034398] qemu-kvm core dump when mirroring block using glusterfs:native backend <http://goo.gl/ugzvMb> |
19:05 |
|
_pol joined #gluster |
19:09 |
|
_pol joined #gluster |
19:15 |
|
_pol joined #gluster |
19:18 |
|
harold_ joined #gluster |
19:40 |
|
rwheeler joined #gluster |
19:43 |
|
kobiyashi joined #gluster |
19:44 |
kobiyashi |
can someone please tell me '0-marker: invalid argument: loc->parent' means in /var/log/gluster/bricks/<vol>.log |
19:44 |
kobiyashi |
on glusterfs3.4.1-3 |
19:45 |
cyberbootje |
Hi, i'm having some issue's with gluster storage. Setup is replica 2 and something went wrong with the netwerk backbone and now brick2 is detatched from the network. |
19:45 |
cyberbootje |
You can guess what i want to do, reconnect brick2, but it has to be 100% safe, we are talking about 1.3TB data so what is best? delete the files on brick 2 before i reconnect? |
19:47 |
dbruhn |
cyberbootje, what version are you running? |
19:48 |
dbruhn |
and did the replication pair simply get disconnected, and now your are wondering about reattaching it? |
19:48 |
cyberbootje |
3.3.1 |
19:49 |
cyberbootje |
yes switch got disconnected |
19:54 |
dbruhn |
Then you should simply be able to restart the services on the disconnected member and be good to go. I am assuming you are running the self heal deamon? |
19:54 |
dbruhn |
Any data written to the system while the second server was down will simply write it back into place |
19:59 |
|
plarsen joined #gluster |
20:05 |
|
bit4man joined #gluster |
20:14 |
cyberbootje |
dbruhn: can i remove the data on the disconnected member en reconnect just to be shure the data is consistent? |
20:14 |
|
zerick joined #gluster |
20:14 |
dbruhn |
It's not quite that simple |
20:17 |
dbruhn |
Here is an article on JoeJulian's blog that might help you. |
20:17 |
dbruhn |
http://joejulian.name/blog/replacing-a-glusterfs-server-best-practice/ |
20:17 |
glusterbot |
<http://goo.gl/pwTHN> (at joejulian.name) |
20:18 |
dbruhn |
I guess technically it would work, but read that blog entry. |
20:18 |
cyberbootje |
thx |
20:20 |
dbruhn |
If the data isn't consistent you will need to fix the split-brain issue |
20:20 |
dbruhn |
and from my experience the data isn't accessible until that's resolved. |
20:20 |
dbruhn |
Only files that were being written at the time of disconnect should be in this state. |
20:24 |
cyberbootje |
well |
20:24 |
dbruhn |
The self heal daemon uses a ton of CPU when it's doing it's thing is the biggest reason why you don't want to go about it the way you mentioned. |
20:24 |
cyberbootje |
there are 2 members, they used to be in sync |
20:25 |
cyberbootje |
one is not anymore and the other is still live and working |
20:25 |
glusterbot |
New news from newglusterbugs: [Bug 1032894] spurious ENOENTs when using libgfapi <http://goo.gl/x7C8qJ> |
20:26 |
cyberbootje |
i just need the second one to join again, no hardware has been swapped or anything, the only thing happend was, the network cable got unplugged |
20:27 |
|
andreask joined #gluster |
20:30 |
dbruhn |
yeah, so if you bring it online it will sync up, any problem files will simply have a split brain error that you can correct |
20:30 |
dbruhn |
if you remove all the data from the suspect one, you might as well be putting new hardware in place |
20:31 |
dbruhn |
yes that will absolutely ensure that the data is consistent between the gluster peers, but it will come with a heavy resource use while it's doing it |
20:34 |
cyberbootje |
well that's ok |
20:35 |
cyberbootje |
is there a manual available how to deal with healing, logging and split brains? |
20:36 |
dbruhn |
Here is the split brain stuff http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ |
20:36 |
glusterbot |
<http://goo.gl/FPFUX> (at joejulian.name) |
20:37 |
dbruhn |
The rest is in the manual on the wiki |
20:39 |
|
ricky-ti1 joined #gluster |
20:49 |
|
badone joined #gluster |
20:52 |
|
bigclouds joined #gluster |
20:54 |
|
_polto_ joined #gluster |
20:54 |
|
_polto_ joined #gluster |
20:57 |
kobiyashi |
has anybody experienced STAT() /path/<filename.cfm> => -1 (Structure needs cleaning) from a FUSE client? |
20:58 |
kobiyashi |
version glusterfs3.4.1-3 |
20:58 |
|
_pol joined #gluster |
20:59 |
kobiyashi |
@glusterbot do you know about "Structure needs cleaning" from a fused client ? |
21:00 |
JoeJulian |
kobiyashi: Last time I saw that it was an xfs error on the brick. |
21:01 |
cyberbootje |
JoeJulian: Hi, i just read your post on split brain issue's |
21:01 |
JoeJulian |
I hope it was helpful. |
21:01 |
cyberbootje |
it was, just that, how do i know what GFID i have to remove? |
21:02 |
JoeJulian |
GFID=$(getfattr -n trusted.gfid --absolute-names -e hex ${BRICK}${SBFILE} | grep 0x | cut -d'x' -f2) |
21:11 |
kobiyashi |
runt hat on the bricks, and if its distributed-rep. then i should have results on both servers? |
21:12 |
kobiyashi |
@JoeJulian xfs repairing, how would you proceed with that, meaning stop glusterd, umount xfs filesystem, repair it, then reverse order? I'm just so afraid of data loss |
21:13 |
|
_pol_ joined #gluster |
21:14 |
y4m4_ |
ndevos: ping |
21:24 |
JoeJulian |
http://xfs.org/index.php/XFS_FAQ#Q:_I_see_applications_returning_error_990_or_.22Structure_needs_cleaning.22.2C_what_is_wrong.3F |
21:24 |
glusterbot |
<http://goo.gl/Hn26r1> (at xfs.org) |
21:24 |
JoeJulian |
kobiyashi: ^ |
21:25 |
JoeJulian |
kobiyashi: So stop the brick that's producing the error, xfs_repair, "gluster volume start $vol force" to restart the brick. If you're worried about the targeted file being corrupt, delete it and "heal ... full". |
21:27 |
social |
a2: ping? |
21:27 |
social |
or any other dev? |
21:44 |
|
failshell joined #gluster |
21:48 |
|
gmcwhist1 joined #gluster |
21:50 |
davidbierce |
Is there still a compelling reason to shift away from a Node that is just a big RAIDed brick, to a Node that has each HD as a single brick? |
21:51 |
kobiyashi |
thanks @JoeJulian i'll give that a try |
21:51 |
davidbierce |
Obviously lots of Nodes/Servers. |
21:53 |
JoeJulian |
davidbierce: Not necessarily. Depends on what works for your use case. |
21:56 |
davidbierce |
Yeah, still torn :) Lots of Virtual Machine images that are between 50GB and 2TB |
21:57 |
|
calum_ joined #gluster |
21:59 |
JoeJulian |
Consider each potential failure point and what that risk means to your design. Also consider disaster recovery when each of those failure points does fail. |
22:03 |
|
Gutleib joined #gluster |
22:05 |
|
_pol joined #gluster |
22:05 |
Gutleib |
Hi! I have a bunch of stupid questions about gluster, is there anyone with knowledge how it works? No configs here, just what does what type of questions... |
22:08 |
|
pdrakeweb joined #gluster |
22:09 |
elyograg |
I know that "E" and "W" mean error and war in gluster logs, but what does "C" mean? |
22:11 |
|
gdubreui joined #gluster |
22:18 |
elyograg |
ah, critical. (grepping the source code) |
22:19 |
Gutleib |
I have 2 servers acting as gluster servers, also they are both gluster clients. And there are some additional clients to connect to gluster servers. Now, gluster servers have 2 nics, one for interconnection, other for everything else. I probed servers through interconnection lan, so it shows it's ip addresses when I ask for status. Then I tried to connect with gluster-client from outer-lan. Works fine, and, if there wouldn't be interconnect lan, it |
22:19 |
Gutleib |
auto connect to both nodes. But the question is, since bricks were probed in other network, how does discovery happen? Does client find other server with outer ip? |
22:20 |
JoeJulian |
@hostnames |
22:20 |
glusterbot |
JoeJulian: Hostnames can be used instead of IPs for server (peer) addresses. To update an existing peer's address from IP to hostname, just probe it by name from any other peer. When creating a new pool, probe all other servers by name from the first, then probe the first by name from just one of the others. |
22:20 |
JoeJulian |
@mount server |
22:20 |
glusterbot |
JoeJulian: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns |
22:22 |
Gutleib |
Thanks! But won't it contaminate outer traffic, since it's round robin? |
22:23 |
Gutleib |
they're asymmetric in my case |
22:23 |
JoeJulian |
The fuse clients connect to ALL of the bricks. That's the way it works. |
22:24 |
Gutleib |
it's just where does it get all bricks ips? |
22:24 |
JoeJulian |
round robin would just be in case a server was down, then your client would still be able to mount the volume and access the data (assuming replication). |
22:25 |
JoeJulian |
If you're using dns or /etc/hosts, from the hostnames. IF you define your volumes using IPs, then all the clients will have to be able to reach those specific IPs. |
22:26 |
elyograg |
JoeJulian: i realize that this is a really basic question. what would cause a 42 second timeout error during a rebalance? The NICs show no errors. The switches have no log entries and show no interface errors. There are no kernel log messages anywhere near the time. |
22:28 |
elyograg |
looking over what my developer has said regarding our problems, it appears to all start with a client ping timeout. |
22:28 |
Gutleib |
JoeJulian: understood. So, when gluster-client discovers volume, gluster-server provides contents of it's /etc/hosts for balancing, right? |
22:28 |
JoeJulian |
42 seconds would be a ping-timeout. The client could not communicate with a server for 42 seconds. That would be either a network issue, or an unresponsive server. |
22:29 |
JoeJulian |
Gutleib: no. |
22:30 |
JoeJulian |
Gutleib: It provides the volume definition. So the client gets a list of bricks to connect to. If you define the bricks by hostname, the client does a lookup to get the IP. |
22:30 |
elyograg |
how often do network issues come with no log entries or errors anywhere? |
22:30 |
elyograg |
I've never seen it myself. |
22:30 |
JoeJulian |
Gutleib: For what you're describing that you want done, that sounds like it would be to your advantage. You define the hostnames to your "internal" ip addresses as resolved by your servers, and your "external" addresses as resolved by your clients. |
22:31 |
JoeJulian |
elyograg: I don't have ping-timeouts so that's a tough one for me to answer. :( |
22:32 |
elyograg |
fpaste.org is having db connection errors. fun. |
22:33 |
JoeJulian |
Maybe they're on the same network... ;) |
22:33 |
elyograg |
heh. |
22:33 |
Gutleib |
JoeJulian: yeah! went googling further. Thanks! You've helped a lot and very quick, too! |
22:33 |
elyograg |
my network interfaces are bonded, but problems with the bonding should show up in the kernel log. |
22:34 |
JoeJulian |
I would think so... |
22:34 |
elyograg |
and the timeouts for that are in milliseconds. well below the ping timeout. |
22:35 |
elyograg |
here's a duplicate of the info that i put on the mailing list. http://fpaste.org/56725/13854180/ |
22:35 |
glusterbot |
Title: #56725 Fedora Project Pastebin (at fpaste.org) |
22:36 |
elyograg |
I'm recreating the rebalance on a smaller scale on my testbed. I'm now worried that it's not going to have the same problems, since it all seems to have been kicked off by a ping timeout. |
22:36 |
JoeJulian |
http://irclog.perlgeek.de/gluster/2013-07-02#i_7278171 |
22:36 |
glusterbot |
<http://goo.gl/cSEMft> (at irclog.perlgeek.de) |
22:37 |
|
geewiz joined #gluster |
22:38 |
elyograg |
on the server that it says it couldn't reach for 42 seconds, there are entries all through the alleged timeout period in etc-glusterfs-glusterd.vol.log-20131103 |
22:39 |
JoeJulian |
Perhaps the server was in some iowait and wasn't responding to TCP? |
22:39 |
JoeJulian |
Just a guess. |
22:39 |
|
pdrakeweb joined #gluster |
22:39 |
elyograg |
that's at least theoretically possible. There are two dell hardware RAID5 arrays, each of which is divided by LVM into four brick filesystems. |
22:40 |
elyograg |
4TB SAS drives. |
22:40 |
|
Gilbs1 joined #gluster |
22:40 |
|
failshel_ joined #gluster |
22:40 |
JoeJulian |
Was this the box that was having trouble mounting because the network was taking too long to come up? |
22:41 |
elyograg |
I don't recall ever having mounting problems. i can't recall whether I had any on the testbed, significantly less capable hardware. |
22:43 |
JoeJulian |
Guess not then. Some dell within the last week. I don't remember whose. |
22:43 |
elyograg |
yay, my wife is nearby and can take me home. no train! |
22:43 |
JoeJulian |
Awe... your train ride must not be as nice as mine. |
22:44 |
elyograg |
I have to catch a shuttle or bus first, then ride for nearly an hour on seats that aren't terribly comfortable. then I have a walk that's about 7 minutes. |
22:45 |
elyograg |
afk for a bit. |
22:45 |
JoeJulian |
Yeah, I have to do the bus (or today I biked) 3 miles to the train, but the seats are nice and cushy and the rails are adjacent to Puget Sound almost all the way. |
22:47 |
|
SFLimey joined #gluster |
22:49 |
semiosis |
JoeJulian on Rails |
22:49 |
JoeJulian |
Hehe |
22:49 |
|
khushildep joined #gluster |
22:50 |
SFLimey |
I'm super new to Gluster and am troubleshooting some errors I get from our Nagios server. Tried googling them but only two things came back and both were logs from this IRC channel. |
22:50 |
SFLimey |
Does anyone know what gluster;CRITICAL;HARD;3;check_glusterfs CRITICAL peers: 10.240.5.169/Connected volumes: gv0/1 unsynchronized entries means? |
22:50 |
|
y4m4_ joined #gluster |
22:51 |
SFLimey |
and should I be worrying about them? |
22:52 |
JoeJulian |
Where's "check_glusterfs" come from? |
22:52 |
SFLimey |
Its from my Nagios logs. |
22:52 |
JoeJulian |
nope. Logs don't write scripts (usually). |
22:58 |
SFLimey |
For sure, just having a hard time figure out whats triggering the alerts. I should keep digging. We're running v3.11 should we think about upgrading? |
23:01 |
JoeJulian |
SFLimey: Absolutely. My question was more about where did you get the script that's producing that alert? If it's home grown, fpaste it so we can see what it considers a critical alert. |
23:06 |
semiosis |
SFLimey: how did you install glusterfs? |
23:07 |
a2 |
social, pong? |
23:07 |
Gilbs1 |
Is there any documentation on the best way to recover from a disaster with your geo-replication slave volume? Our plan is to point our servers to the slave volume if a disaster happens to our main site. What would be the best way or returning "new" data back to the master volume, reverse the geo-replication direction? |
23:10 |
|
nage joined #gluster |
23:10 |
|
nage joined #gluster |
23:15 |
JoeJulian |
lol... |
23:15 |
JoeJulian |
semiosis: Check this one out: http://www.fpaste.org/56734/21335138/ |
23:15 |
glusterbot |
Title: #56734 Fedora Project Pastebin (at www.fpaste.org) |
23:16 |
JoeJulian |
a2: You might be interested in that too. |
23:17 |
semiosis |
JoeJulian: istr seeing that before |
23:17 |
semiosis |
after the stat, can you rm? |
23:17 |
semiosis |
oh try stating the parent directory |
23:17 |
semiosis |
then rm again |
23:18 |
JoeJulian |
Done both... no joy. |
23:18 |
semiosis |
hrm |
23:18 |
JoeJulian |
I'm looking at the bricks now. |
23:18 |
semiosis |
something about a cached dirent |
23:18 |
semiosis |
in gluster process memory |
23:18 |
semiosis |
but i dont remember if i had to remount the client or restart the bricks |
23:19 |
semiosis |
s/remount/just remount/ |
23:19 |
semiosis |
glusterbot: meh |
23:19 |
glusterbot |
semiosis: Error: I couldn't find a message matching that criteria in my history of 1000 messages. |
23:19 |
glusterbot |
semiosis: I'm not happy about it either |
23:19 |
* JoeJulian |
smacks glusterbot with a wet trout. |
23:22 |
JoeJulian |
Hmm, it's on the bricks, the attributes are clean, and the gfids match. |
23:30 |
|
jag3773 joined #gluster |
23:42 |
|
d-fence joined #gluster |
23:42 |
|
Gilbs1 left #gluster |
23:49 |
Gutleib |
Uhmmm |
23:50 |
Gutleib |
I have some practical questions now… I removed my previous setup, and made records in /etc/hosts different for server/clients and clients |
23:51 |
|
failshell joined #gluster |
23:51 |
|
bigclouds_ joined #gluster |
23:52 |
Gutleib |
now when i try to create volume is first gives a warning Failed to perform brick order check, and then fails with /my-underlying-folder or a prefix of it is already part of a volume |
23:52 |
glusterbot |
Gutleib: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y |
23:53 |
|
_BryanHm_ joined #gluster |
23:54 |
Gutleib |
wow. information century magic. glusterbot is awesome, even though it smells fish |
23:58 |
Gutleib |
ow |
23:58 |
|
helmo joined #gluster |
23:58 |
JoeJulian |
semiosis: Yep, that was totally client-state. I mounted the volume to a different directory and was able to delete it no problem. |
23:58 |
Gutleib |
but Failed to perform brick order check seems unrelated |
23:58 |
kobiyashi |
@JoeJulian: did you ever find out why you couldn't rm that file? I'm seeing something similar in my setup, where a file is created off samba, while the underlying filesystem is FUSE mounted |
23:59 |
kobiyashi |
also the volume is mounted on apache to serve.... |
23:59 |
semiosis |
JoeJulian: i think you could disable one of the performance xlators to fix up that running client, but idk which one it would be |
23:59 |
kobiyashi |
my developers use a file editor, save, browse, and they do not see their changes made via the web browser, like there some funky caching |
23:59 |
JoeJulian |
Gutleib: Yep, that's unrelated... That means that your replication will occur on the same server. |