Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-07-07

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
05:37 IlyaE joined #fuel
05:45 Longgeek joined #fuel
06:29 al_ex8 joined #fuel
06:34 Ch00k joined #fuel
06:57 neith joined #fuel
06:58 ddmitriev joined #fuel
07:02 IlyaE joined #fuel
07:16 odyssey4me joined #fuel
07:38 eshumakher joined #fuel
07:45 pasquier-s joined #fuel
08:04 saibarspeis joined #fuel
08:10 al_ex8 joined #fuel
08:13 guillaume__ joined #fuel
08:16 omelchek joined #fuel
08:29 brain461 joined #fuel
08:31 AndreyDanin joined #fuel
08:36 guillaume__ joined #fuel
08:44 al_ex8 joined #fuel
09:03 DaveJ__ joined #fuel
09:30 geekinutah joined #fuel
09:59 odyssey4me joined #fuel
10:23 artem_panchenko joined #fuel
10:54 guillaume__ joined #fuel
11:14 pasquier-s joined #fuel
11:17 guillaume__ joined #fuel
11:20 brain461 joined #fuel
12:30 accela-dev joined #fuel
12:30 jbellone_ joined #fuel
12:31 jbellone_ joined #fuel
12:32 jseutter joined #fuel
12:50 strictlyb joined #fuel
12:50 TVR_ morning all
13:30 brain461 joined #fuel
13:32 e0ne joined #fuel
13:37 AndreyDanin joined #fuel
14:03 ogelbukh joined #fuel
14:25 mattgriffin joined #fuel
14:27 al_ex8 joined #fuel
14:28 Kiall joined #fuel
14:28 Kiall Heya - Does anyone happen to have a public link for the fix to https://bugs.launchpad.net/fuel/+bug/1312212 ?
14:32 mattymo Kiall, it's nothing special. open nova/openstack/common/db/sqlalchemy/session.py and add '1047' to the list of connection error codes
14:33 Kiall Ah - That's what I thought they might be, but saw references to oslo db which we can't just patch like that
14:36 Longgeek joined #fuel
14:40 jaypipes joined #fuel
14:58 IlyaE joined #fuel
15:04 e0ne joined #fuel
15:47 thehybridtech joined #fuel
15:52 IlyaE joined #fuel
16:06 xarses joined #fuel
16:17 pasquier-s_ joined #fuel
16:26 angdraug joined #fuel
16:41 IlyaE joined #fuel
17:13 IlyaE joined #fuel
17:14 jobewan joined #fuel
17:14 dilyin joined #fuel
17:37 IlyaE joined #fuel
17:52 Kupo24z xarses: MiroslavAnashkin how does fuel deal with failed & replaced nodes? is it possible to remove a node post-deployment or will there just be a ghost entry always?
17:56 MiroslavAnashkin Kupo24z: Yes, you simply remove the failed node, Deploy changes, add the failed node again, Deploy changes.
17:56 xarses Kupo24z: fuel will allow you to removed the failed node. In the case of a controller it should be replaced. Puppet will re-run on all controllers when any controller role is deployed. for the other roles. the node will simply be deleted. If you deploy another of that role, it won't likely re-deploy all of the the other roles in the same. If that role had hanging configuration then there may be some ghost config left
18:01 Kupo24z xarses: this is specifically in a production install assuming the deployment went fine and an OS disk just failed, we would add a new compute/storage/controller and delete the old. will this mess with the existing config or is it relatively painless?
18:03 xarses Kupo24z: it will mess with the existing controller configs, so any customizations not in the manifest will likely be reverted
18:04 xarses for computes/storage the delete or add does nothing to the rest of the env
18:04 xarses in the case of ceph, you'd have to mark the osd out yourself
18:04 xarses in the case of compute, you'd need to remove the compute from the nova service-list yourself
18:05 xarses if you explode a controller, we wipe up the mess because the clustering configs need to know about the other nodes.
18:07 Kupo24z Is it possible to do it without a support ticket being made?
18:15 Kupo24z & are there any plans for fuel to linux raid/mdadm the OS partitions?
18:16 hyperbaba joined #fuel
18:16 hyperbaba hi there
18:16 MiroslavAnashkin Yes, and for second part - we plan soft RAID support in 6.0 and still hope to make it available as an experimental feature in 5.1
18:17 hyperbaba i have a major problem with openstack deployed with fuel 5.0 . I am unable to add an osd to the ceph cluster. As soon i do the required ceph-deploy commands the osd is marked down and won't come up.
18:17 hyperbaba can anyone help please. I am spinning around for days
18:20 IlyaE joined #fuel
18:22 MiroslavAnashkin hyperbaba: Are you trying to add new OSD node manually? Or you used Fuel UI to add one more node with OSD role?
18:22 hyperbaba MiroslavAnashkin: I am trying to add a node manually on already deployed node
18:23 hyperbaba MiroslavAnashkin: Wrongly put. I already deployed a node using fuel. Just trying to add a osd disk
18:24 hyperbaba MiroslavAnashkin: Because during deployment one of the disk was not provisioned (returned 1 instead of 0 error in ceph-deployment)
18:29 MiroslavAnashkin hyperbaba: I found the following failed CEPH OSD drive replacement guide the most detailed: http://karan-mj.blogspot.ru/2014/03/admin-guide-replacing-failed-disk-in.html
18:30 hyperbaba MiroslavAnashkin: not working . As soon i deploy a new disk it's marked down
18:30 hyperbaba tried 10 times now
18:34 hyperbaba MiroslavAnashkin: I have a bunch of problems with ceph when using fuel 5.0. When i firstly deployed openstack manualy from scrach with ceph , none of those problems occured. Is 5.1 any better regarding ceph?
18:37 MiroslavAnashkin hyperbaba: Xarses should know better on our future improvements/bug fixes to Ceph
18:37 MiroslavAnashkin hyperbaba: What does `ceph-deploy disk list <node name>` show after you ran ceph deploy?
18:38 angdraug for one, 5.1 will have upgraded ceph (to Firefly) and ceph-deploy (to 1.5)
18:39 hyperbaba MiroslavAnashkin: for disk in question : [node-22][INFO  ] /dev/sdc :
18:39 hyperbaba [node-22][INFO  ]  /dev/sdc1 ceph data, active, cluster ceph, osd.3, journal /dev/sdc2
18:39 hyperbaba [node-22][INFO  ]  /dev/sdc2 ceph journal, for /dev/sdc1
18:41 MiroslavAnashkin And what was the failed disk name?
18:43 tatyana joined #fuel
18:43 hyperbaba osd.3 on /dev/sdc
18:43 hyperbaba it was not failed
18:44 hyperbaba it was not deployed properly during provisioning from fuel. I've kicked out all references of that disk from ceph and tried to add it as a new. Even tried with new osd number. No luck
18:45 hyperbaba disk is ok. Mounted and fs created on it. Even ceph-osd process is running for it. But from the cluster view it's marked down as soon is ceph-deployed
18:47 hyperbaba I've searched all the logs on all the nodes. Found nothing to give me a clue why it's down. I even tried "ceph osd in osd.3" to force it in but after "down interval of 300 sec" it marked out again
18:49 hyperbaba MiroslavAnashkin: do you have deployment steps from puppet so i can see what fuel is doing when ceph is in question?
18:53 MiroslavAnashkin There are `ceph-deploy osd prepare ${devices}` and `ceph-deploy osd activate ${devices}`
18:55 hyperbaba MiroslavAnashkin: are you doing ceph-deploy disk zap in the recepie? It is needed for redeployment scenarios, so xfs can be created again over the old ove
18:56 Kupo24z MiroslavAnashkin: im doing a Ubuntu fuel 5 HA install and one controller says ready and the other two are hung at about 50%, anything i can check? No errors reported in the logs tab
18:56 Kupo24z & the last log entry was 30 mins ago
18:57 MiroslavAnashkin Kupo24z: Depends on the roles set, assigned to the controllers. Timeout is 2 hours
19:00 MiroslavAnashkin Kupo24z: You may ssh to the stalled controllers and check the network connections by Admin, management and storage networks.
19:01 MiroslavAnashkin hyperbaba: Please run the following command first, to check if your OSD works
19:01 MiroslavAnashkin `ceph-deploy osd activate node-22:sdc`
19:02 MiroslavAnashkin If does not help - please run the following sequence. You may have to give Ceph some time to re-balance cluster between the steps.
19:02 MiroslavAnashkin `ceph-deploy disk zap node-22:sdc`
19:02 MiroslavAnashkin `ceph-deploy --overwrite-conf osd prepare node-22:sdc`
19:02 MiroslavAnashkin `ceph-deploy osd activate node-22:sdc`
19:03 hyperbaba done those commands exactly. not working
19:03 hyperbaba after last command ceph osd tree showing that osd is down
19:03 hyperbaba after "down interval" of 3000 seconds is marked autoout and down
19:04 IlyaE joined #fuel
19:04 hyperbaba when doing ceph-deploy osd activate you can't reffer to disk. Instead you must reffer to created partition
19:05 hyperbaba or you will get something like this from ceph-deploy : ameError: global name 'ret' is not defined
19:07 xarses hyperbaba: can you paste your /root/ceph.log?
19:09 tatyana joined #fuel
19:12 hyperbaba Internal server error on paste.openstack.org :(
19:13 Kupo24z i use http://pastebin.mozilla.org/ seems to be up all the time
19:14 hyperbaba http://tny.cz/ff0003ee . Password is 123
19:14 hyperbaba xarses http://tny.cz/ff0003ee . Password is 123
19:16 xarses hyperbaba: one thing to note is that you zaped sdc
19:16 xarses but sdc was part of the boot md
19:17 xarses so there might be issues there
19:17 * xarses keeps reviewing the log
19:17 hyperbaba xarses: ? This disk was brand new. Taken from the packaging. There where no md
19:18 xarses hmm it was replaced after fuel deployed?
19:19 hyperbaba no
19:19 hyperbaba it was not provisioned in the first run
19:19 xarses fuel will create a boot md accross all of the disks, even if no disk role was specified for it
19:20 xarses if the disk was present when fuel deployed the node, it should be part of the boot md
19:21 hyperbaba but disk was marked for ceph in the configure panel for the node
19:21 xarses thats fine, it only uses like 100mb for it.
19:22 xarses it annoys me too, but it prevents a much worse issue if the system can't make up it's mind about which device to boot
19:23 xarses is this the /root/ceph.log from multiple nodes, or just the one in question?
19:23 hyperbaba The one in question
19:27 xarses ok so these two commands failed, looks like from your run. There is no useful log data in ceph.log for them
19:27 xarses 'ceph-disk-prepare', '--fs-type', 'xfs', '--cluster', 'ceph', '--', '/dev/sdc'
19:27 xarses exited 1
19:27 xarses '/sbin/sgdisk', '--new=2:0:2048M', '--change-name=2:ceph journal', '--partition-guid=2:1deacbc5-5d6e-4855-8483-ee4c374d2f29', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/sdc'
19:27 xarses exited 4
19:28 xarses further down Error: Partition(s) 2 on /dev/sdc have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
19:29 xarses and right after ceph-disk: Error: partition 1 for /dev/sdc does not appear to exist
19:30 hyperbaba Those errors happened during fuel provisioning the node. I am aware of them, i wanted to include this disk in ceph cluster after deployment finished.
19:30 hyperbaba And whatever i do the osd stays in the down state.
19:30 xarses if you added the node to the ceph-osd role, fuel should have included it
19:31 hyperbaba but it did not
19:31 hyperbaba only sdb was part of the cluster after deploymen
19:31 hyperbaba i simply wanted to add this sdc to cluster and run into problems
19:32 hyperbaba i've removed that osd refference from ceph crush map and friends
19:32 hyperbaba did ceph-deploy commands to add this sdc disk
19:32 hyperbaba and he is part of the cluster but it's marked down
19:32 xarses /var/lib/ceph/osd/ceph-<osd num> mounted?
19:32 hyperbaba yed
19:32 hyperbaba yes
19:33 hyperbaba and corresponging ceph-osd process is running
19:33 hyperbaba no errors regarding that osd.3 on any other node
19:33 hyperbaba and it's still down
19:33 hyperbaba ceph osd tree|grep osd.3    : 30.91osd.3down0
19:34 xarses can you paste /proc/mounts ?
19:34 e0ne joined #fuel
19:34 hyperbaba http://tny.cz/a92f5e95
19:45 xarses have you done 'ceph osd out osd.3' before?
19:46 hyperbaba yes
19:46 hyperbaba and after rebalancing did rm of the same
19:47 xarses and you also did the 'ceph osd crush remove osd.3'
19:48 hyperbaba yes. did that also
19:48 hyperbaba completely get rid of the osd.3 from the ceph cluster
19:48 hyperbaba resulting cluster was in good health
19:48 e0ne joined #fuel
19:48 xarses you noted the rm, did you get the 'ceph auth del osd.3'
19:48 hyperbaba yes
19:49 hyperbaba removed the auth keys also
19:50 hyperbaba also stop ceph-osd id=3 and unmounted /dev/sdc1 after that
19:50 hyperbaba then i did the ceph-deploy stuff
19:50 hyperbaba got osd.3 again in the system
19:50 hyperbaba as new,exists but down
19:51 hyperbaba and after 300 seconds got this state in ceph osd dump : osd.3 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 :/0 autoout,exists 64863a51-ba8e-40d1-9784-16ad45d72fdc
19:57 xarses seems like the osd can't talk to the mon correctly
19:57 xarses not sure why
19:57 xarses it should just come back online
19:57 xarses there should be a /var/log/ceph/osd.3 log or similar
19:58 xarses if thats not helpful then you might want to use strace to attach to the ceph-osd for 3 and see if it's spewing any odd messages
19:58 hyperbaba will do that right away
20:00 hyperbaba hangs on : futex(0x7f2d841b59d0, FUTEX_WAIT, 17018, NULL
20:01 hyperbaba do i need to follow the forks also?
20:01 xarses http://meenakshi02.wordpress.com/2011/02/02/strace-hanging-at-futex/
20:02 xarses seems likely
20:03 hyperbaba I see connection time out messages. But can't figure out what connections
20:05 xarses probably the monitor, you can see what the handle number is, and then compare it to the lsof output greping for for the pid you are chasing
20:11 hyperbaba can't figure it out
20:13 hyperbaba are firewall rules part of the ceph recepie?
20:13 hyperbaba for puppet?
20:14 hyperbaba [pid 23778] open("/proc/loadavg", O_RDONLY) = 29
20:14 hyperbaba [pid 23778] statfs("/var/lib/ceph/osd/ceph-3", {f_type=0x58465342, f_bsize=4096, f_blocks=243547120, f_bfree=243538773, f_bavail=243538773, f_files=121833024, f_ffree=121832993, f_fsid={2081, 0}, f_namelen=255, f_frsize=4096}) = 0
20:15 hyperbaba this are the futex calls in the loop i keep seeing
20:15 hyperbaba so it's ok
20:15 hyperbaba but i can't figure out what tcp connection are needed
20:15 xarses hyperbaba: yes, if you had the ceph role on the node, then the firewall should already be open correctly
20:16 hyperbaba hmmm
20:16 hyperbaba out of ideas
20:17 xarses well the really smart ceph folks are on oftc.net #ceph
20:17 hyperbaba will try there
20:18 hyperbaba thank you very much
20:18 xarses your welcome
20:18 xarses Ill stalk you there =)
20:22 hyperbaba :) are they on some diferent port? i can't connect
20:27 hyperbaba found it. What TZ are those guys?
20:27 hyperbaba from inktank i mean?
20:44 xarses LA
20:44 xarses usually gmt-5 to gmt-7
20:44 xarses but there are lots of lurkers too
21:14 IlyaE joined #fuel
21:35 xarses joined #fuel
22:32 Kupo24z is 5.0.1 out yet or soon?
22:52 IlyaE joined #fuel
23:10 IlyaE joined #fuel
23:15 eshumakher joined #fuel
23:24 xarses joined #fuel
23:55 IlyaE joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary