Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-09-29

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:31 Arminder- joined #fuel
00:38 jpf joined #fuel
00:46 harybahh joined #fuel
01:25 justif joined #fuel
02:27 Rajbir joined #fuel
02:47 harybahh joined #fuel
03:15 adanin joined #fuel
03:52 AKirilochkin joined #fuel
04:11 Longgeek joined #fuel
04:47 harybahh joined #fuel
05:03 ArminderS joined #fuel
05:12 Longgeek_ joined #fuel
05:19 anand_ts joined #fuel
05:19 HeOS joined #fuel
06:34 stamak joined #fuel
06:48 harybahh joined #fuel
06:48 flor3k joined #fuel
06:53 merdoc joined #fuel
07:04 harybahh joined #fuel
07:06 pasquier-s joined #fuel
07:28 avorobiov joined #fuel
07:43 adanin joined #fuel
07:47 adanin_ joined #fuel
07:48 Longgeek joined #fuel
07:50 pasquier-s_ joined #fuel
07:55 stamak joined #fuel
07:56 Arminder joined #fuel
07:58 dkaigarodsev joined #fuel
08:01 pal_bth joined #fuel
08:03 lordd joined #fuel
08:06 Rajbir joined #fuel
08:13 ddmitriev joined #fuel
08:17 HeOS joined #fuel
08:24 artem_panchenko left #fuel
08:24 artem_panchenko joined #fuel
08:25 lordd joined #fuel
08:28 flor3k joined #fuel
08:29 anand_ts hi all, I did installation without internet for Fuel master node with a static IP in our network.
08:30 anand_ts After installation I connected server in to network and tried to access fuel dashboard via network, but it is not connecting. what could be th eproblem
08:34 merdoc can you reach network form fuel?
08:34 kaliya anand_ts: is the master pingable?
08:35 flor3k joined #fuel
08:36 anand_ts kaliya: not pingable
08:36 anand_ts merdoc: nope, from fuel also I cannot ping gateway,dns
08:36 kaliya anand_ts: seems a local net configuration, you can run fuelmenu and setup the NIC accordingly
08:38 anand_ts kaliya: the problem while installation is Fuel server found 2 dhcp servers while installation, so I unplugged uplink to the switch and did the installation.
08:39 kaliya anand_ts: did you separate the segment where the fuel master is installed from others, by VLANs or so?
08:39 e0ne joined #fuel
08:40 kaliya anand_ts: if you or your network team did, seems a routing problem...
08:40 anand_ts kaliya: going to do that, before that I just tried it without connecting to switch
08:41 kaliya anand_ts: you can try with different switches or by separating with VLAN. It's very important that the master lies on a network where no other DHCP servers operate
08:41 merdoc kaliya: Hi! do you know something about bug, that cause 'no host found' while trying to create instance from qcow2 image?
08:41 kaliya merdoc: looks like a suitable hypervisor is not found
08:41 merdoc kaliya: yep. but that probleam appears ONLY while using qcow2 images
08:42 merdoc if I use raw image, converted from that qcow - all work
08:42 kaliya merdoc: I can't reckon nothing related now... I check
08:43 merdoc kaliya: here some logs http://paste.openstack.org/show/115671/
08:43 anand_ts merdoc: Could you try using scheduler_default_filters = AllHostsFilter in your nova.conf
08:43 merdoc "IOError: [Errno 28] No space left on device" while I got 1.4Tb free space on ceph osds
08:43 merdoc anand_ts: it's not a problem with scheduler
08:44 merdoc kaliya:  it's related to 5.1 only
08:44 kaliya thanks merdoc now I'm looking into
08:45 kaliya merdoc: from which kind of deployment? (centos/ubuntu ha/non gre/vlan ceph/lvm)?
08:46 merdoc kaliya: centos, non-ha, neutron-gre,cpeh
08:46 merdoc ceph*
08:46 kaliya thanks merdoc, did you enable swift also?
08:46 merdoc 1 sec
08:47 merdoc no
08:48 merdoc kaliya: also I set to true 'use qcow format for images' in scheduler settings
08:49 merdoc storage - cinder, glance and nova on ceph
08:49 Rajbir joined #fuel
08:49 merdoc ceph radosgw disabled
08:50 kaliya merdoc: thanks now I will try, I already have the same env running
08:51 merdoc kaliya: there is one thing - preinstalled cirros qcow image works fine!
08:51 merdoc but my image with win serv 2012 r2 - won't
08:51 kaliya merdoc: ok I will try to make a new one in qcow2 and see :)
08:56 e0ne joined #fuel
09:02 akupko joined #fuel
09:02 kaliya merdoc: with which container format? please run glance image-list
09:03 merdoc kaliya: | 96555c9a-0e6d-4be3-ab02-29079912eed4 | ws12   | qcow2       | bare             | 8547008512  | active |
09:05 merdoc kaliya: also, while using glance I got warning - '/usr/lib/python2.6/site-packages/eventlet/hubs/__init__.py:8: UserWarning: Module backports was already imported from /usr/lib64/python2.6/site-packages/backports/__init__.pyc, but /usr/lib/python2.6/site-packages is being added to sys.path'
09:05 kaliya merdoc: yes this is a known warning/bug
09:05 merdoc ok
09:07 kaliya merdoc: I cannot reproduce with both cirros and fedora.qcow that are working fine
09:07 kaliya with which flavor are you trying to allocate vm?
09:09 merdoc kaliya: m1.small, m1.medium
09:09 merdoc on 5.0.1 that qcow image work fine
09:09 kaliya so are you sure your win.qcow2 boots fine
09:09 kaliya ah
09:09 merdoc cirros also works for me now
09:10 kaliya cirros.qcow?
09:10 merdoc yes
09:11 merdoc preinstalled one -> c57f6c04-7354-40b1-9b45-2193319936a2 | TestVM | qcow2       | bare             | 13167616    | active
09:11 kaliya sure you had enough space? from your log the error says: IOError: [Errno 28] No space left on device
09:11 merdoc kaliya: 1.4Tb on ceph osds, I think it's more. than enough (%
09:12 merdoc raw image works
09:12 kaliya merdoc: in case yes )
09:14 merdoc kaliya: our Dr_drache also got same error (%
09:16 merdoc http://irclog.perlgeek.de/fuel/2014-09-25#i_9411954 here some of his problem. He got same problem, but with different logs
09:17 kaliya thanks merdoc
09:17 Longgeek joined #fuel
09:18 teran joined #fuel
09:19 kaliya merdoc: you just uploaded your qcow file through horizon -> images panel right?
09:20 merdoc kaliya: no, I upload it from CLI via glance image-create
09:20 kaliya ok
09:20 merdoc as well as raw image
09:33 kaliya joined #fuel
09:39 teran joined #fuel
09:45 flor3k joined #fuel
09:49 Rajbir I've upgraded my fuel node to 5.1 from 5.0.1 and now when I try to create a new environment and adding nodes
09:49 Rajbir I'm getting exception
09:49 Rajbir [7f0836a62740] (logger) Response code '500 Internal Server Error' for PUT /api/nodes from 172.17.42.1:57984
09:49 Rajbir File "/usr/lib/python2.6/site-packages/nailgun/network/manager.py", line 457, in get_default_networks_assignment
09:49 Rajbir nics[0]['name']
09:49 Rajbir IndexError: list index out of range
09:49 Rajbir more details here -> http://paste.pugmarks.net/Q99dMLAW
09:49 Rajbir the servers are Dell and I can see 4 NICs on each node from Fuel UI
09:49 Rajbir they show up as eth0, eth1... in fuel UI
09:49 Rajbir these same nodes were used for environment running fuel 5.0.1
09:49 Rajbir i destroyed that environment and creating fresh one with same nodes
09:49 Rajbir i'm creating centos 5.1 environment
09:49 Rajbir but even if i try ubuntu 5.1, centos 5.0.2 or centos 5.0.1 getting same error
09:49 Rajbir even rebooted fuel node, still the same
09:49 pal_bth joined #fuel
09:55 kaliya Rajbir: could you please inspect your dockerctl logs nailgun?
09:56 Rajbir sure
09:57 Longgeek_ joined #fuel
09:59 pasquier-s joined #fuel
10:01 Rajbir Warning: Could not retrieve fact fqdn
10:01 Rajbir [debug] stdcopy.go:111 framesize: 87
10:01 Rajbir Warning: Config file /etc/puppet/hiera.yaml not found, using Hiera defaults
10:01 Rajbir above are the noticable warnings that I'm getting
10:01 kaliya Rajbir: those warning are ok
10:03 Rajbir Warning: Could not retrieve fact fqdn
10:03 Rajbir [debug] stdcopy.go:111 framesize: 73
10:03 Rajbir Warning: Host is missing hostname and/or domain: 82588179cc95
10:03 Rajbir [debug] stdcopy.go:111 framesize: 87
10:03 Rajbir Warning: Config file /etc/puppet/hiera.yaml not found, using Hiera defaults
10:03 Rajbir few more
10:04 Rajbir --/usr/lib64/python2.6/site-packages/sqlalchemy/sql/expression.py:1927: SAWarning: The IN-predicate on "nodes.id" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate.  Consider alternative strategies for improved performance.
10:07 Rajbir 2014-09-29 11:06:21 ERROR
10:07 Rajbir [7f0836a62740] (logger) Response code '500 Internal Server Error' for PUT /api/nodes from 172.17.42.1:59564
10:07 Rajbir 2014-09-29 11:06:21 ERROR
10:07 Rajbir [7f0836a62740] (logger) Traceback (most recent call last):
10:07 Rajbir File "/usr/lib/python2.6/site-packages/web/application.py", line 239, in process
10:07 Rajbir return self.handle()
10:07 Rajbir File "/usr/lib/python2.6/site-packages/web/application.py", line 230, in handle
10:07 Rajbir return self._delegate(fn, self.fvars, args)
10:07 Rajbir File "/usr/lib/python2.6/site-packages/web/application.py", line 420, in _delegate
10:07 Rajbir return handle_class(cls)
10:07 Rajbir File "/usr/lib/python2.6/site-packages/web/application.py", line 396, in handle_class
10:07 Rajbir return tocall(*args)
10:07 Rajbir File "<string>", line 2, in PUT
10:07 Rajbir File "/usr/lib/python2.6/site-packages/nailgun/api/v1/handlers/base.py", line 93, in content_json
10:08 Rajbir Getting these when trying to add a new compute node.
10:08 kaliya Rajbir: thanks. Could you please pack all those errors in a pastebin? I will ask to devs
10:08 lordd joined #fuel
10:10 teran joined #fuel
10:11 Rajbir Kaliya : http://paste.pugmarks.net/wuv3lkkP patse bin url.
10:20 pasquier-s_ joined #fuel
10:22 flor3k joined #fuel
10:23 kaliya merdoc: I'm looking into it creating another env, with a decent ceph storage space
10:24 Rajbir kaliya: do you think it's same as for https://bugs.launchpad.net/fuel/+bug/1273099
10:24 Rajbir how can we purge and clean the db
10:25 Rajbir the commands doesn't seem to be working anymore since we moved to docker
10:25 Rajbir - /opt/nailgun/bin/manage.py dropdb
10:25 Rajbir - /opt/nailgun/bin/manage.py syncdb
10:25 Rajbir - /opt/nailgun/bin/manage.py loaddefault
10:25 kaliya Rajbir: could fit. Please run dockerctl shell nailgun, and then try to run those commands
10:25 kaliya from inside the container
10:25 Rajbir nope won't work too
10:25 Rajbir inside the docker shell
10:26 Rajbir [root@fuel ~]# dockerctl shell nailgun
10:26 Rajbir [root@0d32de149b1d ~]# /opt/nailgun/bin/manage.py dropdb
10:26 Rajbir bash: /opt/nailgun/bin/manage.py: No such file or directory
10:26 kaliya just `manage.py`? it's in /usr/bin/manage.py
10:27 Rajbir otcha, thanks
10:27 kaliya Rajbir: please run. And tell if works :)
10:29 Rajbir great, it worked
10:30 Rajbir but in the whole episode, we lost all we had before
10:30 Rajbir the purpose of getting to upgrade is lost
10:30 Rajbir its like fresh install
10:30 Rajbir we need to handle this better
10:30 kaliya thanks for the feedback, Rajbir, I will report
10:31 Rajbir its fine for us since we don't have other environments on this fuel node
10:31 Rajbir if we had, it would have been a mess
10:38 f13o joined #fuel
10:49 merdoc kaliya: ok ok. here how my typical server looks like: http://i.imgur.com/svy1nwR.png
10:53 adanin joined #fuel
11:09 dilyin joined #fuel
11:11 pal_bth joined #fuel
11:20 Rajbir Kaliya : tried to deploy the nodes after purging the dbs  but now stucked at this error ::
11:20 Rajbir [468] MCollective call failed in agent 'puppetsync', method 'rsync', failed nodes:
11:20 Rajbir ID: 11 - Reason: Fail to upload folder using command rsync -c -r --delete rsync://10.20.0.2:/puppet/modules/ /etc/puppet/modules/.
11:20 Rajbir Exit code: 12, stderr: rsync: read error: Connection reset by peer (104)
11:20 Rajbir rsync error: error in rsync protocol data stream (code 12) at io.c(759) [receiver=3.0.6]
11:29 evg Rajbir: it seems you've got a problem with the rsync container. Please check if it's running.
11:29 evg Rajbir: dockerctl list -l
11:30 Rajbir evg : yes, it's running
11:30 Rajbir [root@fuel ~]# dockerctl list -l | grep  rsync
11:30 Rajbir rsync        fuel/rsync_5.1           Running      fuel-core-5.1-rsync
11:30 Rajbir [root@fuel ~]#
11:33 Rajbir evg :  I've restarted the rsync container but still getting the same error.
11:36 Rajbir any other ideas ?
11:36 evg Rajbir: do you see rsync traffic on master node?
11:36 evg tcpump -i ethX -n port 873
11:39 Rajbir Yes, I'm able to see the traffic on master node for rsync
11:41 evg Rajbir: dockerctl shell rsync ip a
11:41 evg Rajbir: iptables -L -n -t nat | grep 873
11:42 Rajbir [root@fuel ~]# iptables -L -n -t nat | grep 873
11:42 Rajbir DNAT       tcp  --  0.0.0.0/0            127.0.0.1           tcp dpt:873 to:172.17.0.11:873
11:42 Rajbir DNAT       tcp  --  0.0.0.0/0            10.20.0.2           tcp dpt:873 to:172.17.0.11:873
11:43 evg Rajbir: dockerctl shell rsync ip a
11:43 Rajbir [root@fuel ~]# dockerctl shell rsync ip a
11:43 Rajbir 41: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
11:43 Rajbir link/ether e2:90:f9:34:b6:5e brd ff:ff:ff:ff:ff:ff
11:43 Rajbir inet 172.17.0.11/16 scope global eth0
11:43 Rajbir inet6 fe80::e090:f9ff:fe34:b65e/64 scope link
11:43 Rajbir valid_lft forever preferred_lft forever
11:43 Rajbir 43: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
11:43 Rajbir link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
11:43 Rajbir inet 127.0.0.1/8 scope host lo
11:43 Rajbir inet6 ::1/128 scope host
11:43 Rajbir valid_lft forever preferred_lft forever
11:44 anand_ts Rajbir : use paste bin to show the output. paste.openstack.org will be handy
11:45 kaliya Rajbir: please use pastebin
11:45 Rajbir alright, will do.
11:46 evg Rajbir: ok. let's go to rsync container and see dump there
11:47 evg Rajbir: sockerctl shell rsync ;;; yum install tcpdump ;;; tcpdump -n -i eth0 port 873
11:47 Rajbir http://paste.pugmarks.net/wLbTe7id
11:48 Rajbir alright
11:51 monester joined #fuel
11:51 evg Rajbir: ?
11:51 Rajbir that's the url for pastebin
11:52 Rajbir I'll be updating with  the tcpdump output in a minute.
11:53 Longgeek joined #fuel
11:53 evg Rajbir: ok
11:54 Rajbir http://paste.pugmarks.net/G0MlGWOx
11:54 Rajbir above URL contains tcpdump output
11:55 harybahh joined #fuel
11:57 evg Rajbir: it seems not what i've asked (inside container) but it show rsync works well
11:58 Rajbir evg :  Do you want me to provide me the output from outside rsync container ?
11:59 evg Rajbir: i meant so but now I see rsync works
11:59 Rajbir yeah, rsync is working fine
12:00 Rajbir Any other ideas as to what it might be ?
12:02 evg Rajbir: couldn't it be some temporary network problems?
12:03 Rajbir Let me check once.
12:04 pasquier-s joined #fuel
12:05 teran_ joined #fuel
12:09 Rajbir tried pinging all the ips from fuel master and no packet loss for any node so it doesn't seems to be a network issue
12:10 Dr_drache joined #fuel
12:10 Dr_drache happy monday
12:11 kaliya hi Dr_drache
12:11 Dr_drache kaliya, hello, time to fix ceph :P
12:11 merdoc :D
12:11 kaliya Dr_drache: lol )
12:11 merdoc Dr_drache: we already try to fix it (%
12:11 Dr_drache merdoc, no good?
12:11 merdoc Dr_drache: cureenttly kaliya can't reproduce it (%
12:12 Dr_drache well, I just redeployed.
12:12 Dr_drache lets see if I can
12:13 kaliya Dr_drache: I know you had problems booting instances, but only from qcow formats
12:13 kaliya Dr_drache: did you try with windows or other operating systems?
12:14 Dr_drache kaliya, with windows and linux, and acaully my ceph crashed after that late friday.
12:14 Dr_drache I have stuck/bad pages
12:14 Dr_drache or had
12:14 merdoc hm, my ceph still works. let's hope it is not related (%
12:14 kaliya ok
12:15 Dr_drache merdoc, it just "stopped" couldn't do anything and just stay degraded, couldn't even delete images.
12:17 merdoc Dr_drache: that happens to me when I restarted controller with ceph-mon
12:17 Dr_drache I tryed a restart after.
12:17 merdoc so I had to restart all my clusters after that
12:17 Dr_drache that obviouslly broke more stuff
12:19 evg Rajbir: are you able to ssh to this unlucky node?
12:21 Rajbir yes.
12:21 merdoc it looked like osds can't 'speek' with mon but ceph status show HELATH_OK, so I could not remove any instance. and when I restarted osds they start to delete it
12:21 Dr_drache merdoc; kaliya http://i.imgur.com/52BXvm4.png <--- should that be used for ceph?
12:21 Dr_drache merdoc, no restarting of OSDs fixed that for me
12:21 merdoc Dr_drache: dunno, but I have it checked to
12:21 Dr_drache merdoc, it's confusing, ceph runs in raw.
12:22 merdoc yes, I know
12:22 Dr_drache but no snapshotting for raw?
12:22 merdoc exactly
12:22 Dr_drache so, that means no snapshotting on ceph?
12:22 merdoc not exactly (%
12:23 Dr_drache I know, not exactly, but... even us smart people are confused a bit by it
12:23 merdoc I now try to create instace from raw and get snapshot of it
12:25 Dr_drache thanks
12:25 Dr_drache I was going to try that next
12:26 Dr_drache but uploading a qcow2 to test with
12:28 lordd joined #fuel
12:30 Rajbir evg : manage to fix the issue with  Rsync but now getting the below error ::
12:30 lordd joined #fuel
12:30 Rajbir evg :  Can  you please help me with the installation order ?
12:31 Rajbir Because it's trying to install Zabbix first and then mongo  instead of Controller first
12:31 azemlyanov joined #fuel
12:32 merdoc Dr_drache, kaliya snapshot from raw failed %(
12:32 Rajbir evg :: http://paste.pugmarks.net/X0UOWFs5 >> error url
12:32 merdoc what log can show me THE TRUTH? (%
12:33 evg Rajbir: I've lost the tread, what is your last issue now? And what was wrong with rsync?
12:34 Dr_drache merdoc, that sucks
12:35 Dr_drache this is why I dislike automated test scripts.
12:36 Rajbir evg : for rsync , I've the followed the steps provided by sergiy here https://bugs.launchpad.net/fuel/+bug/1355281
12:37 Rajbir now while trying to deploy the nodes, I'm getting error for glance [ Error running RPC method deploy: Disabling the upload of disk image because glance was not installed properly,]
12:37 Rajbir this is the pastebin URL : http://paste.pugmarks.net/X0UOWFs5
12:38 lordd joined #fuel
12:42 Dr_drache merdoc, so. maybe ceph is broken :P
12:42 merdoc Dr_drache: so why it's worked on 5.0.1 ? (%
12:46 Dr_drache merdoc, seems it's still broken to me.
12:46 Dr_drache unless I'm expecting to boot from an image quicker than 5 min
12:47 Dr_drache ceph is degraded.
12:47 Dr_drache by making ONE non cirros instane
12:47 Dr_drache instance
12:47 Dr_drache kaliya
12:48 kaliya what `ceph status` does say Dr_drache
12:48 Dr_drache kaliya, let me screenshot it
12:49 kaliya ok thanks Dr_drache
12:50 Dr_drache kaliya : http://paste.openstack.org/show/116684/
12:51 kaliya Dr_drache: on how many nodes? `ceph osd tree`
12:52 Dr_drache kaliya - fresh deploy with a clean ceph, then I CLI uploaded a qcow2, and then tryed to boot from that image.
12:52 Dr_drache http://paste.openstack.org/show/116685/
12:52 kaliya Dr_drache: I've done the same with a brand-new env. Imported via CLI a qcow2 image and created an instance. I worked with Fedora20
12:53 kaliya I see
12:53 Dr_drache this was a known working qcow2 windows.
12:53 Dr_drache as in, the same qcow2 is in production.
12:53 kaliya so did you try in 5.0.1 and worked fine (as merdoc says)?
12:54 Dr_drache I think it did, but I didn't do much with 5.0.1 before I went to nightlys, thought 5.1 would fix some networking issues
12:55 Dr_drache kaliya, and I thought this was just a bad deployment,  so I redeployed.
12:57 kaliya Dr_drache: got it
12:57 kaliya root
12:57 kaliya (sorry wrong windows)
13:02 evg Rajbir: There are a couple of bugs related. Please look here https://bugs.launchpad.net/fuel/+bug/1359282
13:02 Dr_drache lol
13:03 evg Rajbir: looks like your case
13:03 Rajbir yeah, already read that but not of much of help :(
13:03 evg Rajbir: theh fix is proposed
13:04 evg Rajbir: is it happenning constantly or by chance?
13:05 Rajbir evg :  I've upgraded to  fuel to 5.1 and since then I'm running into issues deploying nodes
13:13 evg Rajbir: yes, it seems a new bug.
13:13 Rajbir yeah, it does seems to be
13:13 Dr_drache i've sold my boss on 5.1 features. this sucks that i's completly unuseble for me.
13:14 Rajbir :D
13:14 Dr_drache kaliya, need anything else before I redeploy?
13:14 evg Rajbir: let us try fix it.
13:14 Rajbir Yeah
13:15 Dr_drache merdoc, going to redeploy without the qcow2 selected.
13:15 kaliya Dr_drache: what's the exact error in nova logs?
13:16 Dr_drache kaliya :
13:16 Dr_drache http://paste.openstack.org/show/116690/
13:16 Dr_drache it seems to be 100% ceph
13:17 evg Rajbir: please, describe what have you got now. Am I right now you came to a new clean deployment?
13:17 kaliya Dr_drache: yes, no space
13:17 Rajbir Yes, I'm on  a clean new deployment
13:17 kaliya Dr_drache: what's `nova hypervisor-stats` ?
13:18 Rajbir evg :  I'm still  stucked on  the glance upload issue
13:18 evg Rajbir: No one node is deployed?
13:18 Dr_drache kaliya : http://paste.openstack.org/show/116696/
13:22 merdoc Dr_drache: ok. I can't do that for now here %(
13:25 Dr_drache merdoc, worth a shot
13:25 Rajbir evg : http://paste.openstack.org/raw/116698/
13:25 sc-rm Have any of you experinced that a restart of a compute node (HP Proliant) results in Fuel want’s to deploy disk configuration changes?
13:25 Rajbir this should help
13:26 kaliya Dr_drache: `ceph health detail` is giving you something deeper? Or in /var/log/ceph on your ceph nodes. We exclude possible hw failures (dmesg) right?
13:27 Dr_drache kaliya : root@node-3:~# ceph health detail
13:27 Dr_drache HEALTH_WARN 124 pgs degraded; 241 pgs stuck unclean; recovery 226/6489 objects degraded (3.483%)
13:27 Dr_drache pg 3.db is stuck unclean for 228860.232311, current state active+remapped, last acting [7,6,0]
13:27 Dr_drache pg 3.d4 is stuck unclean since forever, current state active+remapped, last acting [9,8,0]
13:27 Dr_drache pg 4.d2 is stuck unclean since forever, current state active+remapped, last acting [8,5,0]
13:27 Dr_drache pg 10.de is stuck unclean since forever, current state active+degraded, last acting [9,8]
13:27 Dr_drache pg 10.d9 is stuck unclean since forever, current state active+degraded, last acting [4,9]
13:27 Dr_drache pg 4.ca is stuck unclean since forever, current state active+remapped, last acting [4,7,0]
13:27 Dr_drache pg 7.cb is stuck unclean since forever, current state active+degraded, last acting [9,4]
13:27 Dr_drache pg 8.c4 is stuck unclean since forever, current state active+degraded, last acting [6,9]
13:27 Dr_drache pg 7.cf is stuck unclean since forever, current state active+degraded, last acting [8,9]
13:27 Dr_drache pg 7.c0 is stuck unclean since forever, current state active+degraded, last acting [5,4]
13:27 Dr_drache pg 4.c1 is stuck unclean since forever, current state active+remapped, last acting [6,5,0]
13:27 Dr_drache pg 3.c6 is stuck unclean since forever, current state active+remapped, last acting [5,4,0]
13:27 Dr_drache pg 3.c0 is stuck unclean since forever, current state active+remapped, last acting [9,4,0]
13:27 Dr_drache pg 4.c6 is stuck unclean since forever, current state active+remapped, last acting [5,8,0]
13:27 Dr_drache pg 5.84 is stuck unclean since forever, current state active+remapped, last acting [8,9,1]
13:27 Dr_drache pg 4.84 is stuck unclean since forever, current state active+remapped, last acting [9,2,0]
13:27 Dr_drache pg 8.77 is stuck unclean since forever, current state active+degraded, last acting [4,5]
13:28 Dr_drache pg 4.78 is stuck unclean since forever, current state active+remapped, last acting [4,5,0]
13:28 Dr_drache pg 8.73 is stuck unclean since forever, current state active+degraded, last acting [9,8]
13:28 Dr_drache pg 7.7d is stuck unclean since forever, current state active+degraded, last acting [9,8]
13:29 Dr_drache joined #fuel
13:29 Dr_drache hopefully I ddin't flood too bad
13:29 kaliya ok Dr_drache
13:29 kaliya please use pastebin :)
13:29 Dr_drache kaliya : http://paste.openstack.org/show/116704/
13:29 Dr_drache checking dmsg on all nodes now
13:32 Dr_drache kaliya, only message in dmesg : http://paste.openstack.org/show/116708/ - on all nodes
13:32 Dr_drache kaliya, it was a mistake to post that here.... sorry (the flood)
13:32 kaliya Dr_drache: no problem
13:34 cvieri joined #fuel
13:36 kaliya Dr_drache: `ceph pg stat` ?
13:36 Dr_drache http://paste.openstack.org/show/116710/
13:37 evg Rajbir: your paste is really regrettable. All the nodes fell with this error (glance)?
13:38 Rajbir let me check
13:40 Rajbir evg : I'm getting those errors in astute logs while clicking on Deploy changes from  http://ip:8000
13:40 kaliya Dr_drache: the problem may arise from the qcow -> raw allocation. Ceph tries to expand the qcow image to the raw maximum
13:40 kaliya Dr_drache: but now we need to troubleshoot what's going wrong on the ceph install... let's look into /var/log/ceph
13:42 Dr_drache nothing in there I think helps
13:43 kaliya Dr_drache: do you have the cirros raw? could you please try to boot?
13:43 bogdando joined #fuel
13:43 Dr_drache kaliya, it's not a raw, but yes.
13:43 Dr_drache last time cirros would work
13:43 kaliya Dr_drache: yes s/raw/qcow/ sorry
13:44 Dr_drache used same settings as the windows, it's going now
13:45 kaliya Dr_drache: ok. What flavour are you associating to the windows instances?
13:46 Dr_drache medium
13:46 Dr_drache launched fine
13:47 kaliya Dr_drache merdoc how big is the windows evaluation image?
13:51 merdoc kaliya: cirros.qcow boot normaly
13:51 merdoc on any flavor
13:52 Dr_drache kaliya, 40gb
13:52 merdoc kaliya: I did step-by-step from that manual http://docs.openstack.org/image-guide/content/windows-image.html so my win.qcow is 10Gb. win.raw after convertation - 8Gb
13:53 merdoc flavor - medium, with 20Gb root partition
13:54 Dr_drache i have 40gb as my min, because thats my qcows max size
13:54 Dr_drache (or configured size)
13:56 Dr_drache bigger fail
13:56 youellet joined #fuel
13:56 Dr_drache and so does smaller.
13:56 Topic for #fuel is now Fuel 5.1 for Openstack: https://wiki.openstack.org/wiki/Fuel | Paste here http://paste.openstack.org/ | IRC logs http://irclog.perlgeek.de/fuel/
14:00 kaliya Today I worked with a fedora-cloud qcow all day long on a similar env. I will try with the win image
14:01 merdoc kaliya: where do you get fedora qcow? can I downolad it and try at my env?
14:03 kaliya merdoc: you can just download from here http://fedoraproject.org/en/get-fedora#clouds
14:03 kaliya merdoc: just a complete or so operating system, a bit more than cirros
14:07 Dr_drache I can try a raw
14:08 kaliya Dr_drache: yes please try
14:09 Dr_drache not confident it will work
14:09 Dr_drache my ceph is already degraded.
14:10 kaliya Dr_drache: your ceph nodes are all running, reachable and ok right?
14:10 Dr_drache yes
14:10 Dr_drache ok other than the pages issue
14:13 MiroslavAnashkin kupo24z: Yes, Jenkins builds new community ISO every day and it includes the latest stable branches of the repos. So, package for this bug https://bugs.launchpad.net/mos/+bug/1371723 should be already in the latest 5.1 ISO
14:17 merdoc kaliya: fedora.qcow starts
14:18 AKirilochkin joined #fuel
14:21 merdoc kaliya: maybe problem in qcow size?
14:22 merdoc I will try to create linux img bigger, than 1gb
14:23 kaliya merdoc: I tried with 10G
14:24 merdoc weird
14:25 kaliya imho, much of weird when windows is involved
14:25 Dr_drache ok
14:25 kaliya :)
14:25 Dr_drache my raw works perfectly
14:25 kaliya Dr_drache: in ceph, raw is reccomended. We should clarify better on our documentation
14:26 Dr_drache should also not allow qcow option when all ceph
14:27 Dr_drache let me test again with my windows converted to raw
14:27 merdoc yep, but I have irrefutable argument - this qas worked for 5.0.1 ((%
14:27 Dr_drache it did
14:27 merdoc s/qas/was/
14:28 Dr_drache and worked well on 4.x
14:28 jobewan joined #fuel
14:28 Dr_drache I bet it's ceph firefly
14:30 Dr_drache anywa
14:30 Dr_drache anyway
14:30 Dr_drache this is going to take a bit
14:30 Dr_drache only on 1gbps network
14:30 Dr_drache 40gb file
14:42 Dr_drache uploading the raw
14:42 anand_ts left #fuel
14:45 azemlyanov joined #fuel
14:46 azemlyanov joined #fuel
15:00 gals joined #fuel
15:03 harybahh joined #fuel
15:17 blahRus joined #fuel
15:20 lordd joined #fuel
15:22 Dr_drache kaliya, merdoc, attempting raw windows
15:22 kaliya ok Dr_drache
15:22 Dr_drache kaliya, boots when booting directly from image.
15:23 kaliya so we can isolate to ceph+qcow Dr_drache?
15:23 Dr_drache kaliya, I think so, let me test a differnt way please.
15:23 Dr_drache trying to boot from image, but create new
15:25 mattgriffin joined #fuel
15:27 rmoe joined #fuel
15:29 Dr_drache kaliya, only works if you boot from the image.
15:30 Dr_drache cannot create new image from the old.
15:30 Dr_drache so, that's not just qcow2. kinda kills ceph.
15:31 Dr_drache or openstack practices.
15:31 AKirilochkin joined #fuel
15:36 Dr_drache merdoc
15:37 Dr_drache sir, have you attempted : http://i.imgur.com/9R5fBJD.png
15:42 kiwnix joined #fuel
15:46 Dr_drache kaliya, it seems only cinder can create images
15:46 Dr_drache lol
15:50 kaliya Dr_drache: is your ceph still degraded?
15:50 Dr_drache worse
15:51 Dr_drache http://paste.openstack.org/show/116754/
15:56 lordd joined #fuel
16:04 Dr_drache kaliya, cluster pretty much dead when it comes to storage
16:04 Miroslav_ joined #fuel
16:05 lordd joined #fuel
16:07 AKirilochkin joined #fuel
16:08 lordd_ joined #fuel
16:10 AKirilochkin_ joined #fuel
16:10 Dr_drache kaliya : safe to redeploy?
16:18 ArminderS joined #fuel
16:51 stamak joined #fuel
16:54 jpf joined #fuel
17:01 jobewan joined #fuel
17:01 lordd joined #fuel
17:14 angdraug joined #fuel
17:19 youellet joined #fuel
17:19 kupo24z MiroslavAnashkin: Any update on the new community build? last one was on sep 26th
17:28 angdraug kupo24z: still rearranging jenkins jobs for juno/6.0 work
17:29 kupo24z angdraug: are you guys still planning on doing 5.1/stable builds?
17:29 angdraug yes: https://fuel-jenkins.mirantis.com/view/ISO/job/fuel_community_5_1_iso/
17:30 kupo24z first 6.0 build going to be released on tech preview of oct 23rd? https://wiki.openstack.org/wiki/Fuel/5.1_Release_Schedule
17:31 kupo24z Ah nevermind, that got moved to oct 30th
17:31 kupo24z https://wiki.openstack.org/wiki/Fuel/6.0_Release_Schedule
17:32 angdraug yup
17:34 kupo24z angdraug: when do you expect the next build of 5.1/stable to be started for community?
17:35 lordd joined #fuel
17:37 Dr_drache blah
17:37 Dr_drache kaliya, redeployed.
17:38 Dr_drache whats the workaround for snapshots, since raw snapshots are not supported?
17:38 angdraug kupo24z: not sure, maybe bookwar knows
17:38 kupo24z angdraug: or can you verify if https://bugs.launchpad.net/mos/+bug/1371723 is commited to the latest build on the 26th?
17:38 angdraug funny I was just looking into that
17:39 angdraug no it's not there yet
17:39 angdraug place to watch is http://fuel-repository.mirantis.com/fwm/5.1.1/ubuntu/pool/main/
17:39 kupo24z sad panda
17:39 angdraug updated oslo.messaging should land there
17:39 angdraug you and me both :(
17:40 kupo24z is xarses on holiday? havnt seen him around anymore
17:40 angdraug yup, he is
17:40 Dr_drache either of you guys know of a workaround for snapshots for ceph, since qcows seem to be broken
17:40 angdraug he's back next week
17:41 angdraug Dr_drache: you have to narrow it down
17:41 angdraug different snapshot code paths in rbd driver for different openstack components have different bugs )
17:41 Dr_drache angdraug, right now, using qcow2 on ceph, seems to acually kill my ceph.
17:42 Dr_drache can only boot from raw images, which don't support snapshotting... so, there has to be a work around
17:42 angdraug qcow2 doesn't work with ceph, nova is supposed to autoconvert it to raw
17:42 angdraug ceph does snapshotting raw image based volumes at the rbd level
17:43 angdraug the only problem is that glance is too broken for this to be very useful in openstack context
17:43 Dr_drache well, I can test a differnt way once this redeploys
17:44 e0ne joined #fuel
17:44 Dr_drache I don't know how much of my conversation with kaliya you are aware of
17:44 mpetason joined #fuel
17:44 angdraug I think the only place where it works the way it's supposed to is cinder volume backed instances
17:44 angdraug none of it, going to have to read the log
17:46 Dr_drache let me know if you want me to fill you in.
17:46 kupo24z afaik snapshotting with ceph was some of the 6.0/juno changes
17:46 kupo24z but right now its just a raw copy of the full disk
17:47 Dr_drache kupo24z, that's fine as well.
17:47 angdraug a bunch of fixes I've made for rbd cow support was merged to nova in juno if that's what you're referring to
17:47 angdraug but we already had most of these in mos since 4.1
17:47 Dr_drache if my image can be "copied" to make new instances, that's all I want to work.
17:47 kupo24z I thought there was a plan to make snapshots with ceph rbd copy-on-write clones?
17:47 kupo24z or was that for volumes only
17:48 Dr_drache can't do that right now with qcow2, or raw.
17:48 Dr_drache with my last deploy.
17:48 angdraug only works right with volumes
17:48 kupo24z Dr_drache: that should be possible, make snapshot will put the instance image in glance which you can spawn new ones of, are you getting boot errors?
17:48 Dr_drache kupo24z, getting out of space errors.
17:49 kupo24z Dr_drache: are you using volumes or ephemeral disks?
17:49 Dr_drache or pgs errors
17:49 angdraug I've seen a couple of librbd related bugs fixed in latest dumpling builds
17:49 Dr_drache kupo24z, volumes/images. http://i.imgur.com/9R5fBJD.png or just "boot from image"
17:49 kupo24z angdraug: would it be terribly difficult to copy that code to ephemeral? or is it working as intended
17:49 angdraug http://ceph.com/docs/master/release-notes/#v0-67-11-dumpling
17:50 kupo24z angdraug: afaik Dr_drache is on 5.1 which would be the later version
17:50 Dr_drache yes, 5.1
17:50 angdraug kupo24z: the problem is with glance, it's fundamentally not suited to support storage backend side snapshots
17:50 Dr_drache right now, after 3 redploys, i'm frustrated.
17:51 Dr_drache cirros OS, I can make images all day, any size I want.
17:51 kupo24z oh yeah i think we had a discussion about this months ago
17:51 angdraug the reason cinder snapshots work is because it's not using glance. if you try to create an image from volume, you get the same copy problem
17:51 Dr_drache raw and qcow2 images (known good from my KVM cluster) "kill" ceph
17:51 kupo24z Dr_drache: what does ceph -s say?
17:52 angdraug http://tracker.ceph.com/issues/8845
17:52 angdraug http://tracker.ceph.com/issues/8912
17:52 angdraug both affect firefly too
17:52 Dr_drache kupo24z; starts with this :
17:52 Dr_drache http://paste.openstack.org/show/116704/
17:52 Dr_drache then it filters to this :
17:52 Dr_drache http://paste.openstack.org/show/116754/
17:53 kupo24z 'ceph osd tree' are any offline?
17:53 Dr_drache kupo24z, no.
17:53 Dr_drache waiting for my redeploy, to try again
17:53 kupo24z What are your replication and pg_num/pgp_num set to?
17:54 Dr_drache 3 and "fuel default"
17:54 Dr_drache lol
17:54 kupo24z I would check out #ceph on irc as well http://ceph.com/resources/mailing-list-irc/
17:55 kupo24z How many osd's do you have per node?
17:55 Dr_drache 3 per node IIRC.
17:56 kupo24z and how many storage nodes ?
17:56 Dr_drache all my nodes were storage nodes, so 3.
17:56 kupo24z 'all nodes' including controller?
17:56 Dr_drache yes
17:57 kupo24z and you have a single controller setup or HA?
17:57 angdraug just checked, of those 2 rbd bug fixes I linked above, one isn't even present in firefly branch (so not backported yet)
17:57 angdraug the other is backported, but after 0.80.5 was released
17:57 Dr_drache HA single. was planning on testing the feature to scale.
17:57 kupo24z Basically you should never run osd's on the controller nodes, since they are the monitors
17:57 kupo24z but idk if its related to your issue
17:58 Dr_drache I was told it's not a GREAT idea... but unless loaded wouldn't cause a problem.
17:58 kupo24z on your next reload try replication of 2 and dont run osd's on the controller
17:58 Dr_drache doesn't replication of 2 hurt ceph more though?
17:59 kupo24z For your small cluster its better to have 2 and not run osd's on the monitor
18:00 Dr_drache I will be scaling to 12-18 nodes straight from a production test.
18:00 Dr_drache of course, that's a moot point right now.
18:03 kupo24z Basically you are not following standard ceph best practices, so its advised to get that fixed before moving forward with troubleshooting
18:04 jpf joined #fuel
18:05 Dr_drache so, fix a non-best practice, but doing another non-best practice; how does that make any sense?
18:07 Dr_drache not saying I won't give it a shot, but that's pretty bad.
18:08 jobewan joined #fuel
18:10 emagana joined #fuel
18:13 kupo24z Dr_drache: a repl of 3 vs 2 is just a failure risk level change, not service impacting at all and on a test cluster it is perfectly fine. Running OSD's on the monitor can cause serious issues and should always be avoided
18:14 Dr_drache so, really, that option should be taken out of fuel.
18:15 Dr_drache I was told the same, and read the same last spring about a repl of 2.
18:17 Dr_drache that the ONLY reason you'd run 2, is for virtual box.
18:17 kupo24z the replication values is just how many failures you are comfortable with, so it totally depends on your cluster setup
18:17 Dr_drache and that, it's a perfectly acceptable solution to have a controller as a OSD.
18:18 Dr_drache so, if it IS acually bad pratice, someone should tell mirantis to stop suggesting it.
18:20 Dr_drache either way, I cancled this redeployment
18:20 Dr_drache and going with 2, and took the controller out of the OSDs.
18:22 teran joined #fuel
18:30 kupo24z bookwar: you around?
18:50 adanin joined #fuel
18:54 stamak joined #fuel
18:55 HeOS joined #fuel
19:16 angdraug Dr_drache: http://docs.mirantis.com/openstack/fuel/fuel-5.1/release-notes.html#placing-ceph-osd-on-controller-nodes-is-not-recommended
19:16 angdraug should also be in the planning guide, but at least it's there now :)
19:20 Dr_drache angdraug, sad that I planned my entire cluster on being able to do that.
19:22 Dr_drache oh well
19:23 Dr_drache I guess being new,  I was miss guided.
19:23 Dr_drache such is life, the worse part about this today, is it worked perfect on 5.0.1 and 4.x
19:28 angdraug at ceph day in san jose last week, there was a presentation from fujitsu about performance testing they did on Firefly
19:28 angdraug in their tests, osd were prone to a lot of context switching on small block size i/o
19:29 angdraug imagine osd by itself generating load avg 20-30 on your controller just in normal operation
19:29 Dr_drache yea, that's not good.
19:29 angdraug and then you lose a node or a drive and there's a spike for rebalance
19:30 angdraug their conclusion was that osd requires a dedicated 1GHz of CPU cycles per drive
19:30 angdraug assuming you're using something line e5-2630
19:30 Dr_drache I remember when openstack was for "commodity" hardware to create a cloud.
19:31 kupo24z angdraug: for "RAID-1 spans all configured disks on a node" I dont see any options in fuel for Raid, am I missing something?
19:31 Dr_drache yea, that's not near true anymore.
19:31 angdraug e5-2630 is pretty much commodity in server space :)
19:32 Dr_drache if by server space, you mean datacenters, then yes.
19:32 angdraug kupo24z: you're supposed to do you raid setup in bios, fuel doesn't set up software raid for you, yet
19:32 kupo24z angdraug: I've tried this previously and it doesnt support motherboard-based raid it looks like
19:32 kupo24z for the master or the nodes
19:33 kupo24z just lists the normal disks, not the raid at /dev/mapper/volumeID
19:33 angdraug as I said, no software raid in fuel right now
19:33 angdraug hardware raid, it will see the array as a single block device
19:34 kupo24z Okay, i guess you would define software raid as mdadm and also dmraid
19:35 kupo24z is there  a blueprint for software raid support?
19:38 e0ne joined #fuel
19:40 jpf_ joined #fuel
19:43 angdraug strangely enough there isn't
19:44 angdraug can you create one?
19:47 angdraug if not, I can do it for you
19:49 kupo24z angdraug: https://blueprints.launchpad.net/fuel/+spec/fuel-software-raid
19:50 Dr_drache thought software raid was a bad idea?
19:51 kupo24z ?
19:52 angdraug thanks
19:52 Dr_drache more points of failure, with very little recovery
19:52 kupo24z Generally its fine & stable, its not very good for performance though
19:52 angdraug depends on what you use software raid for
19:53 angdraug root on sw raid makes sense
19:53 kupo24z and no pesky hw raid card exact model replacements needed
19:53 kupo24z plus a lot of servers have no options for hw raid cards, ie micro blade servers
19:54 adanin joined #fuel
19:57 angdraug triaged it as Low priority for "future" release series
19:57 angdraug definitely too late for 6.0 and 6.1 unless you implement it yourself :)
19:58 angdraug but at least now there's a statement out that if somebody implements it we'll take it
19:58 angdraug implementation wise will probably have to be 2 separate blueprints for fuel master and target nodes
19:58 kupo24z If we have some free dev time i'll ask people about it
19:58 angdraug with the latter dependent on image based provisioning, so that we don't have to re-do it
19:59 MiroslavAnashkin May be still not too late - we are going to change the whole provisioning in 6.0
19:59 angdraug good point
19:59 angdraug kupo24z: come to fuel irc meeting on thursday, or try to chat up warpc when he's online
19:59 angdraug sorry, not warpc, vkozhukalov
20:01 MiroslavAnashkin And we are going to support HP B320i/b120i RAID line and these are actually the very advanced soft RAID controllers.
20:01 Dr_drache angdraug, kupo24z - best way to go about testing this?
20:03 Dr_drache obviouslly, no qcow2
20:03 kupo24z did your redeployment finish? is your cluster healthy?
20:04 Dr_drache kupo24z, double checking the health right now.
20:10 Dr_drache kupo24z all "looks" good
20:23 angdraug in the meanwhile community iso builds should restart tonight
20:25 kupo24z yay, looking forward to testing those fixes
20:26 kupo24z Any idea when the new oslo messenging will be on the repo?
20:28 Dr_drache so, almost finish uploading my raw image, nice 10G raw.
20:35 Dr_drache well, I bet I did that wrong.
20:36 Dr_drache angdraug; kupo24z
20:36 Dr_drache http://i.imgur.com/btPT4Je.png
20:36 Dr_drache is that wrong?
20:38 Dr_drache http://paste.openstack.org/show/116844/ <--- error
20:43 emagana joined #fuel
20:44 emagana joined #fuel
20:44 emagana joined #fuel
20:56 e0ne joined #fuel
21:04 kupo24z Dr_drache: do you have Ceph RBD for volumes checked in fuel settings?
21:05 kupo24z Dr_drache: how large is that image you are sourcing from
22:10 emagana joined #fuel
22:33 e0ne joined #fuel
22:34 teran joined #fuel
22:35 teran_ joined #fuel
22:37 emagana joined #fuel
23:17 lordd joined #fuel
23:43 mattgriffin joined #fuel
23:49 rmoe joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary