Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-01-16

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
23:18 ilbot3 joined #fuel
23:18 Topic for #fuel is now Fuel for Openstack: http://fuel.mirantis.com/ | Paste here http://paste.openstack.org/ | IRC logs http://irclog.perlgeek.de/fuel/
23:42 jhurlbert regarding my issue earlier, the problem only happens on ubuntu. there is no issue on centos
00:12 iunruh left #fuel
01:46 rmoe joined #fuel
01:49 angdraug if it's specific to ubuntu, it's definitely not the setup/cleanup race, that one was happening on centos, too
01:50 angdraug one difference between centos and ubuntu that impacts HA is that deb packages start the services on install while rpms don't
01:51 angdraug I'm working on the bug #1264388 that is caused by that difference, but your logs appear to be at a different point during deployment, so not necessarily related
01:51 jhur joined #fuel
01:51 angdraug one more thing you can try is run puppet manually on the node with debug on, see if there's anything useful in the log
01:52 angdraug something like "puppet apply /etc/puppet/manifests/site.pp --debug --evaltrace |tee puppet.log"
01:54 jhur we are getting an error in the Fuel Master/Orchestrator logs about the Cirros image failing uploading: http://paste.openstack.org/show/61322/
01:54 jhur any ideas on a fix?
01:55 angdraug jhur: something must be wrong with your glance
01:55 angdraug check your glance logs,
01:55 angdraug if you're using ceph rbd as glance backend, check if your ceph cluster came up ok
01:56 jhur yes, we are trying to use ceph rdb as glance backend
01:57 jhur this is the glance log: http://paste.openstack.org/show/61323/
01:57 jhur ill check ceph
01:57 angdraug https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/ceph/README.md
01:57 angdraug have a look at 'verifying the deployment' section
02:00 jhur when i run 'ceph status' the health is ok: http://paste.openstack.org/show/61324/
02:00 jhur angdraug, I will take a look at what you said, thank you
02:18 angdraug jhur: next thing to check after ceph status is lspools, see if you've got all the pools and credentials
02:19 angdraug hm, you glance log looks like you don't have sheepdog package
02:21 angdraug try the instructions from https://bugs.launchpad.net/ubuntu/+source/glance/+bug/1202479
02:37 xarses joined #fuel
03:25 jhur is sheepdog required when using ceph?
03:25 jhur also, the log i pasted is from centos, not ubuntu, if that matters
04:09 besem9krispy joined #fuel
04:29 IlyaE joined #fuel
04:33 chengaodi joined #fuel
04:37 chengaodi left #fuel
05:01 IlyaE joined #fuel
05:10 IlyaE joined #fuel
05:53 cppking joined #fuel
05:53 cppking Is this channel is openstack fuel?
05:58 anotchenko joined #fuel
06:03 besem9krispy It is is.
06:25 marcinkuzminski joined #fuel
06:25 marcinkuzminski joined #fuel
06:48 bookwar joined #fuel
07:03 cppking when i use fuel master node install ubuntu server , it can't retrive autocofig file from http://ip/cblr/svc/op/ks/profile/ubuntu_1204_x86_64; then i checked fuel master node ,then i found i couldn't access this url ,it said that "no mode specified"
07:05 cppking Does fule master node can specify  mode ?
07:18 mihgen joined #fuel
07:19 anotchenko joined #fuel
07:51 e0ne joined #fuel
07:52 anotchenko joined #fuel
08:26 miguitas joined #fuel
08:32 mrasskazov joined #fuel
08:36 cppking left #fuel
08:48 vkozhukalov joined #fuel
08:51 dshulyak joined #fuel
09:01 cppking joined #fuel
09:02 cppking anybody here?
09:02 cppking what's the best  message queue service for openstack ,rabbitmq \zeromq\Qpid  or other? i want't the best performance..
09:04 cppking another question: when i use fuel 4.0 , i can't install centos and ubuntu through fuel's default cobbler setting ..where can i get document about how to configure cobbler  "mode" and setting kernel parameter
09:06 anotchenko joined #fuel
09:07 SteAle joined #fuel
09:20 akupko joined #fuel
09:36 geceyuruyucusu joined #fuel
09:36 geceyuruyucusu hi everyone
09:37 geceyuruyucusu I couldn't reach my instance with floating ip even though it is defined on fuel for this openstack environment
09:38 geceyuruyucusu I can ping the gateway but not to floating ips
09:38 geceyuruyucusu any recommandation?\
09:48 AndreyDanin geceyuruyucusu, which type of deployment do you chose? NovaNetwork or Neutron?
09:50 AndreyDanin geceyuruyucusu, Where do you run the ping command? Please, describe your network scheme in detail.
09:50 geceyuruyucusu nova
09:50 geceyuruyucusu ok
09:50 geceyuruyucusu I ping from my local computer on which fuel has been installed and openstack is built
09:51 geceyuruyucusu netstat -rn gives:
09:51 geceyuruyucusu 0.0.0.0         10.10.10.1      0.0.0.0         UG        0 0          0 eth0
09:51 geceyuruyucusu 10.10.10.0      0.0.0.0         255.255.255.0   U         0 0          0 eth0
09:51 geceyuruyucusu 10.20.0.0       0.0.0.0         255.255.255.0   U         0 0          0 vboxnet0
09:51 geceyuruyucusu 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
09:51 geceyuruyucusu 172.16.0.0      0.0.0.0         255.255.255.0   U         0 0          0 vboxnet1
09:51 geceyuruyucusu 172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 vboxnet2
09:51 geceyuruyucusu floating ip of isntance is 172.16.0.128
09:51 AndreyDanin geceyuruyucusu, Is ping allowed in Security Groups?
09:52 geceyuruyucusu I explicitly doesn't define
09:52 AndreyDanin You have to
09:52 geceyuruyucusu additionally, i couldnt ping ip on outside network from isntance
09:55 geceyuruyucusu Thanks for your answer AndreyDanin.
09:55 geceyuruyucusu I have totally forget to open the port. Now I can ping
09:55 cppking joined #fuel
09:55 cppking left #fuel
09:57 rvyalov joined #fuel
10:15 anotchenko joined #fuel
10:16 rmoe joined #fuel
10:18 anotchenko joined #fuel
10:19 AndreyDanin cppking, 0mq should be the fastest because of C magic.
10:51 tatyana joined #fuel
11:22 e0ne joined #fuel
11:29 e0ne_ joined #fuel
11:41 anotchenko joined #fuel
11:49 teran joined #fuel
11:50 anotchenko joined #fuel
11:58 richardkiene joined #fuel
12:00 e0ne joined #fuel
12:09 e0ne_ joined #fuel
12:10 MiroslavAnashkin joined #fuel
12:27 tatyana_ joined #fuel
12:30 richardkiene I'm actually working on the same install as jhurlbert, and it looks like it did get further with Centos, but it still failed
12:30 richardkiene we're currently getting the following errors from the Orchestrator logs
12:30 richardkiene [2138] 2033cce4-87d6-4cfe-9570-203421c5d536: Upload cirros image failed
12:31 richardkiene [2138] Error running RPC method deploy: Upload cirros image failed, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:303:in `upload_cirros_image'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:46:in `deploy'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:97:in `deploy'",
12:31 richardkiene "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in
12:31 richardkiene `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]
12:31 richardkiene it appears that there may be a couple of open bugs for this issues
12:32 richardkiene https://bugs.launchpad.net/fuel/+bug/1249337
12:33 richardkiene https://bugs.launchpad.net/fuel/+bug/1259501
12:33 richardkiene Bug 1259501 appears related, but the commit message seems to indicate it is for a Swift conflict
12:34 richardkiene We're using Ceph
12:34 anotchenko joined #fuel
12:35 richardkiene I'm getting a dev environment setup for Fuel, so if anyone has pointers that could be more helpful than the stack trace, I'd appreciate some help pointing me in the correct direction :)
13:00 evgeniyl` richardkiene: in logs should be a command for image uploading which failed https://github.com/stackforge/fuel-astute/blob/master/lib/astute/orchestrator.rb#L311-L314 try to run this command directly on controller node
13:03 richardkiene sweet, I was just going through that file
13:07 teran joined #fuel
13:13 richardkiene evgeniyl`: Would you expect that to be a glance command?
13:14 richardkiene I'm seeing "... --property murano_image_info='{"title": "Murano Demo", "type": "cirros.demo"}'                  --file '/opt/vm/cirros-x86_64-disk.img'
13:14 richardkiene stdout:
13:14 richardkiene stderr: The request you have made requires authentication. (HTTP 401)
13:14 richardkiene exit code: 1"
13:18 richardkiene which I suppose would explain the failed cirrus image
13:20 teran joined #fuel
13:26 tatyana joined #fuel
13:32 richardkiene Unfortunately I cannot get to the fuel node ATM because someone didn't leave me a password behind :)
13:35 evgeniyl` richardkiene: yes, you are right, in stderr section is explanation of you problem.
13:36 richardkiene would you imagine that is a configuration error, or a bug in the script?
13:36 richardkiene I'm struggling to find a way that it would be a user setup error
13:43 evgeniyl` I have no idea, try to login on controller and execute this command manually, check that glance works correctly, check logs. MiroslavAnashkin did you see such problem?
13:46 miguitas Hi, thanks for the help the other day, now I have a little issue, I'm checking that the services are not updating their status, i check ribbitmq and mysql connection and are working and I can connect to them, any idea?
13:49 richardkiene evgeniyl`: Thanks!
13:50 MiroslavAnashkin richardkiene: I've seen similar issue - cirros image upload failure on expired timeout. But never saw authentication error on upload.
13:50 MiroslavAnashkin richardkiene: please run `source openrc` command as root before attempt to upload cirros image manually.
13:51 MiroslavAnashkin richardkiene: it should set the necessary OpenStack credentials for command line tools
13:53 MiroslavAnashkin miguitas: Do you mean nova* services, mysql or rabbitmq-server?
13:54 miguitas yeah nova-* looking in nova-manage service list
13:54 richardkiene MiroslavAnashkin: Right on, will do. I should have access to the fuel node in another hour. I'll report back in a bit. Definitely appreciate the pointers
14:00 miguitas mysql and rabbit works fine, I don't know why they are not updating their service status
14:00 miguitas I mean I don't know why nova-* are not updating their service status
14:01 MiroslavAnashkin miguitas: sorry,  i've forgot, are you use HA or simple single controller installation?
14:02 miguitas in HA
14:04 miguitas using HA , I check if haproxy it's working
14:05 pvbill joined #fuel
14:07 MiroslavAnashkin please run `mydql -e "show status like 'wsrep%';"` first and check the cluster is synced and there are all controllers in the cluster
14:08 besem9krispy joined #fuel
14:09 MiroslavAnashkin miguitas: if everything OK with mysql - you may simply restart all the nova services with XXX status, while not all of them required to be started.
14:16 miguitas MiroslavAnashkin: All nodes are working and synced, I restart the services and the are not updating the is status
14:16 miguitas but are working perfectly
14:18 MiroslavAnashkin miguitas: Hmm, could you please share your `nova-manage service list` output on http://paste.openstack.org/ ?
14:18 miguitas if I use nova-manage to disable one of the services it works, nova-manage can update the database
14:20 MiroslavAnashkin miguitas: several nova services should be started under Nova account.
14:27 miguitas MiroslavAnashkin: http://paste.openstack.org/show/61358/
14:29 MiroslavAnashkin miguitas: So, all the stopped services located at the same machine...
14:29 miguitas yes, it's all in the same machine
14:30 miguitas damn, all are in the same machine
14:30 MiroslavAnashkin miguitas: yes, everything works, except one controller is out of service
14:30 miguitas an the database connection seams working
14:31 miguitas and the services in the node looks that are working too
14:32 MiroslavAnashkin miguitas: Then, please check if the problematic controller is listed among the running nodes in `rabbitmqctl cluster_status`
14:33 miguitas http://paste.openstack.org/show/61360/
14:35 mihgen joined #fuel
14:38 MiroslavAnashkin miguitas: Ok, rabbit is OK. Please run `crm status` then.
14:39 miguitas http://paste.openstack.org/show/61361/
14:41 MiroslavAnashkin miguitas: please share `mydql -e "show status like 'wsrep%';"` from any controller as well
14:56 miguitas http://paste.openstack.org/show/61363/
14:59 MiroslavAnashkin miguitas: have you reboot your nodes recently?
15:00 miguitas yes, this morning
15:01 MiroslavAnashkin miguitas: Such issue is possible when you reboot the whole cluster and it takes too long time to sync mysql. In such case all the nova services on non-synced node stops by timeout.
15:02 miguitas Oh, is there any way to force mysql to sync ?
15:03 MiroslavAnashkin miguitas: Mysql is already synced. So,  you may simply restart nova services on that node (consoleauth first) or reboot the whole node-61.domain.tld
15:03 miguitas ok, thanks
15:04 MiroslavAnashkin miguitas: And please check all network interfaces on node-61.domain.tld - management network is working, but others may not
15:16 e0ne joined #fuel
15:30 IlyaE joined #fuel
15:32 richardkiene joined #fuel
15:41 e0ne_ joined #fuel
16:00 geceyuruyucusu left #fuel
16:14 dhblaz joined #fuel
16:17 dhblaz Interesting problem, I moved my cluster to the datacenter yesterday now I get this:
16:17 dhblaz [root@node-16 ~]# nova service-list
16:17 dhblaz ERROR: An unexpected error prevented the server from fulfilling your request. (ProgrammingError) (1146, "Table 'keystone.user' doesn't exist") 'SELECT user.id AS user_id, user.name AS user_name, user.domain_id AS user_domain_id, user.password AS user_password, user.enabled AS user_enabled, user.extra AS user_extra, user.default_project_id AS user_default_project_id \nFROM user \nWHERE user.name = %s AND user.domain_id = %s' ('admin', 'default') (HTTP 500)
16:29 MiroslavAnashkin dhblaz: please try running `mydql -e "show status like 'wsrep%';"` to ensure MySQL is online.
16:32 dhblaz two nodes show wsrep_ready = ON
16:32 dhblaz One shows OFF
16:36 dhblaz Trying: crm resource restart clone_p_mysql
16:36 dhblaz To see if that will fix the issue with one node not synced
16:37 dhblaz It doesn't seem right that the two nodes that show "ON" also show:
16:37 dhblaz wsrep_cluster_status = Primary
16:38 MiroslavAnashkin dhblaz: wsrep_cluster_status = Primary is correct.
16:39 MiroslavAnashkin dhblaz: We configure Galera as master-master by default, without slave nodes. So, all synced nodes become primary
16:43 dhblaz I'm having trouble getting Galera to stabilize.  After crm resource restart clone_p_mysql one node isn't running mysql and it looks like the other two are not joined to the same cluster.
16:45 MiroslavAnashkin dhblaz: If 2 nodes are synced and one is not - please try `crm resource restart p_mysql` on the node with non-started MySQL
16:48 dhblaz I think that even after I get this sync issue resolved I don't think that it will bring back my keystone table
16:49 MiroslavAnashkin dhblaz: It is our next step - ensure if keystone table exist.
16:49 dhblaz All show ON now
16:50 MiroslavAnashkin dhblaz: if exists - check the user list
16:51 dhblaz Shouldn't wsrep_cluster_size > 1?
16:51 MiroslavAnashkin dhblaz: master node has the user setting template in/etc/puppet/modules/galera/templates/wsrep-init-file.erb
16:52 MiroslavAnashkin dhblaz: yes, cluster size indicates current number of nodes joined Galera cluster
16:52 dhblaz All three nodes show wsrep_cluster_size = 1
16:52 MiroslavAnashkin dhblaz: yes, and all 3 has status initialized
16:53 dhblaz Excuse the paste all three show:
16:53 dhblaz wsrep_local_state          | 4                                    |
16:53 dhblaz | wsrep_local_state_comment  | Synced
16:54 MiroslavAnashkin dhblaz: Do you mean all 3 are Synced but cluster size is 1 on every node?
16:54 dhblaz Yes
16:55 MiroslavAnashkin great, Pacemaker started 3 different clusters.
16:56 MiroslavAnashkin Please run  `crm resource stop p_mysql` on one of machines
16:56 MiroslavAnashkin Then run `crm resource restart p_mysql` on other node
16:57 dhblaz http://paste.openstack.org/show/61373/
16:57 dhblaz That paste is before do the stop/restart you suggested
16:58 MiroslavAnashkin yes, it show 3 clusters
16:58 miguitas marcinkuzminski: finally I find what was the problem, rabbitmq, the cluster status looks perfect but was totally broken when in the reboot tried to join to the cluster...
16:59 MiroslavAnashkin Please check the node you just ran restart is joined available running cluster instead of starting new one
16:59 marcinkuzminski miguitas: i lost the context for that ?
16:59 dhblaz MiroslavAnashkin: I think that `crm resource stop p_mysql` operates on all nodes.
17:00 MiroslavAnashkin marcinkuzminski: That was probably to me
17:00 miguitas yes sorry
17:00 miguitas -_-
17:00 marcinkuzminski cheers ! :)
17:00 dhblaz Becuase after I ran it crm resource list shows:
17:00 dhblaz Clone Set: clone_p_mysql [p_mysql]
17:00 dhblaz Stopped: [ p_mysql:0 p_mysql:1 p_mysql:2 ]
17:00 miguitas the autocomplete
17:00 MiroslavAnashkin dhblaz: No, p_mysql is local resource but clone_p_mysql is cluster wide.
17:01 dhblaz my node-16 still reports wsrep_cluster_size=1
17:01 miguitas MiroslavAnashkin: many thanks for your help
17:01 dhblaz From my history:
17:01 dhblaz 845  crm resource stop p_mysql
17:01 dhblaz 846  crm resource list
17:01 dhblaz 847  crm resource restart p_mysql
17:01 dhblaz all three were stopped now all three are started
17:03 dhblaz Confirmed that other nodes see wsrep_cluster_size=1
17:10 dhblaz Desperate for things to try I compared /etc/mysql/conf.d/wsrep.cnf from all three nodes
17:10 dhblaz Diffs show only the IP addresses differ
17:11 MiroslavAnashkin dhblaz: All these nodes are managed by pacemaker, so it generated config on-the-fly
17:12 MiroslavAnashkin dhblaz: And it looks like this script is buggy, it always sets all 3 my nodes to Initialized state, without starting new cluster. in your case it started 3 new clusters.
17:15 dhblaz Here are the command lines from the process table
17:15 dhblaz where xxx = 5010, 1567, 3825
17:15 MiroslavAnashkin dhblaz: Please try `crm resource restart clone_p_mysql --cleanup`
17:15 dhblaz usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-cluster-address=gcomm:// --pid-file=/var/run/mysql/mysqld.pid --port=3307 --wsrep_start_position=01bf805b-7ec9-11e3-0800-15d545645c23:3825
17:16 MiroslavAnashkin dhblaz: Yes, parameter --wsrep-cluster-address=gcomm:// orders node to start new cluster instead of joining existing one
17:17 dhblaz [root@node-16 ~]# crm resource restart clone_p_mysql --cleanup
17:17 dhblaz usage: restart <rsc>
17:17 dhblaz should I do a stop/cleanup/restart?
17:17 MiroslavAnashkin Then, `crm resource cleanup clone_p_mysql`
17:18 MiroslavAnashkin and after that `crm resource start clone_p_mysql`
17:18 dhblaz so stop/cleanup/restart/cleanup/start?
17:18 MiroslavAnashkin dhblaz: cleanup, then start
17:18 dhblaz no stop?
17:19 MiroslavAnashkin No
17:20 dhblaz The process ID didn't change
17:20 dhblaz So I don't think that it was effective
17:22 MiroslavAnashkin OK, then stop-cleanup-start
17:26 dhblaz Process table still shows "--wsrep-cluster-address=gcomm://" and show status like 'wsrep%'; shows wsrep_cluster_size=1
17:26 dhblaz Process id is different now
17:28 richardkiene An update from my earlier comments
17:28 richardkiene It appears that the 401 response is due to the password we used for the admin OpenStack account
17:29 richardkiene it contained multiple dollar signs
17:29 richardkiene which I'm assuming caused an escaping issue
17:29 richardkiene we're re-deploying with a different password to see if that solves the issue
17:30 richardkiene everything else looked like it was properly setup, it just couldn't upload the image to glance because it was not properly escaping the password (again, just an assumption)
17:36 dhblaz MiroslavAnashkin: Here is a snippet from my daemon.log
17:36 rmoe joined #fuel
17:36 dhblaz http://paste.openstack.org/show/61376/
17:40 MiroslavAnashkin Yes, before assembling the cluster it waits for 5 minutes after start
17:47 dhblaz It seems like something happens before that 5 minutes is up because I don't see any additional logs 5 minutes later
17:52 SteAle joined #fuel
17:58 MiroslavAnashkin dhblaz: previously it might take up to 20 minutes to assemble galera cluster. I thought we already fixed it...
18:01 SteAle joined #fuel
18:03 dhblaz MiroslavAnashkin: next step?
18:05 teran joined #fuel
18:10 MiroslavAnashkin dhblaz: There is rebuild_cluster procedure inside the /usr/lib/ocf/resource.d/mirantis/mysql-wss script, but it is not included into the list of script input parameters.
18:12 dhblaz Are you familiar with this error in the logs?
18:12 dhblaz warning: decode_transition_key: Bad UUID
18:14 vkozhukalov joined #fuel
18:15 mrasskazov joined #fuel
18:16 MiroslavAnashkin There are 3 UUIDs in Galera. One is cluster UUID, second is node UUID and one more is node UUID, but existing only in /var/lib/mysql/grastat.dat
18:18 dhblaz joined #fuel
18:18 dhblaz Sorry, connection died
18:18 dhblaz looking at log...
18:25 MiroslavAnashkin UUID increases with every committed transaction. Every node should synchronize itself with all the nodes with greater UUID
18:27 dhblaz I can't get p_mysql to start on just one node.
18:27 dhblaz Based on what I read in galera I need one node to start first
18:29 e0ne_ joined #fuel
18:29 richardkiene MiroslavAnashkin and evgeniyl`: It successfully deployed after removing the $'s from the password
18:30 richardkiene Seems like a Glance API bug
18:32 angdraug joined #fuel
18:33 xarses joined #fuel
18:33 tatyana_ joined #fuel
18:34 dhblaz Wow:
18:34 dhblaz [root@node-16 ~]# crm cib diff
18:34 dhblaz [root@node-16 ~]# crm_shadow -d
18:34 dhblaz Segmentation fault
18:35 MiroslavAnashkin dhblaz: Hmm, same thing on my cluster...
18:41 e0ne joined #fuel
18:43 MiroslavAnashkin dhblaz: Please try `crm_attribute -t crm_config --name mysqlprimaryinit --delete` on each controller
18:44 MiroslavAnashkin Then, select one controller and run `crm_attribute -t crm_config --name mysqlprimaryinit --update done`
18:44 dhblaz I got this on node-16: Deleted crm_config attribute: id=cib-bootstrap-options-mysqlprimaryinit name=mysqlprimaryinit
18:44 dhblaz And nothing on 17 and 18
18:46 MiroslavAnashkin And then run `crm resource restart clone_p_mysql` on that controller. And wait 5+ minutes
18:50 dhblaz MiroslavAnashkin: I didn't do the restart yet
18:50 dhblaz I saw that I had two nodes up and the other was stuck.
18:51 dhblaz Now I have three nodes "ON"
18:51 dhblaz Do you still want me to to do the restart?
18:52 MiroslavAnashkin If all 3 are synced, all 3 are listed in wsrep_incoming_addresses and all 3 have status primary - then no.
18:52 MiroslavAnashkin If there are Donor nodes - then also no
18:54 dhblaz Okay
18:54 dhblaz still give error about missing keystone table
18:56 MiroslavAnashkin Please also restart all nova* and/or openstack-nova* services on every controller
18:56 MiroslavAnashkin Hmm, and on compute nodes as well, they all caught timeouts
18:58 dhblaz You think that will resolve the missing keystone table?
19:00 MiroslavAnashkin And please run `crm resource restart p_neutron-plugin-openswitch-agent` , it also depends on MySQL
19:03 dhblaz you mean `crm resource restart p_neutron-openvswitch-agent`?
19:05 MiroslavAnashkin dhblaz: What is in your `crm resource list` ?
19:05 dhblaz http://paste.openstack.org/show/61382/
19:06 MiroslavAnashkin No, clone_p_neutron-openvswitch-agent.
19:07 MiroslavAnashkin crm should start it cluster wide and select single node to start up p_neutron-dhcp-agent and p_neutron-l3-agent
19:11 dhblaz Done but the error still remains
19:12 MiroslavAnashkin what does `crm status` tells?
19:12 MiroslavAnashkin And  `nova-manage service list`, if everything OK with crm?
19:14 dhblaz http://paste.openstack.org/show/61383/
19:16 dhblaz When I strip out the DEBUG messages I get all :-)
19:17 IlyaE joined #fuel
19:19 MiroslavAnashkin Ok, looks like backends are up and working. Now, please run `mysql -e "show databases;"`. Check If keystone is in place,
19:20 dhblaz I see keystone in the database list
19:20 dhblaz http://paste.openstack.org/show/61384/
19:21 dhblaz And the tables from the keystone db:
19:21 dhblaz http://paste.openstack.org/show/61385/
19:23 MiroslavAnashkin And no "user" table in keystone.
19:26 dhblaz No, there isn't
19:26 dhblaz Should I use /usr/bin/openstack-db to fix this?
19:43 MiroslavAnashkin dhblaz: Hmm, not sure if we should run `keystone-manage db_sync` or openstack-db...
19:43 dhblaz Ultimately openstack-db runs keystone-manage db_sync
19:44 dhblaz I ran keystone-manage db_sync while I was waiting for you
19:44 dhblaz mysql dumps of keystone db before and after show no change
19:44 MiroslavAnashkin But working keystone db should include the set of tables like following
19:44 MiroslavAnashkin http://paste.openstack.org/show/61390/
19:47 MiroslavAnashkin dhblaz: Yes, please run /usr/bin/openstack-db
19:50 dhblaz running: openstack-db --password <PW FROM connection LINE in /etc/keystone/keystone.conf> --service keystone --init
19:54 e0ne joined #fuel
19:55 dhblaz I had to comment out the check that mysql was running because it doesn't work properly (on centos at least)
19:55 dhblaz Then I get:
19:55 dhblaz Please enter the password for the 'root' MySQL user:
19:55 dhblaz Verified connectivity to MySQL.
19:55 dhblaz Database 'keystone' already exists. Please consider first running:
19:55 dhblaz /usr/bin/openstack-db --drop --service keystone
19:56 dhblaz I have a backup so I am going to run that
20:01 dhblaz Okay, I have a working keystone db now but it is empty :(
20:06 MiroslavAnashkin Does openstack-db also run `nova-manage db_sync` and `cinder-manage db_sync` ?
20:09 dhblaz with different options I believe so
20:19 dhblaz It there a way to employ puppet to set all this back up for me?
20:22 MiroslavAnashkin you may run "puppet agent -t" or `puppet apply /etc/puppet/maniftests/site.pp`, depending on your Fuel version
20:23 dhblaz What do you get when you do `keystone tenant-list`
20:25 MiroslavAnashkin List of tenants
20:25 MiroslavAnashkin Admin and one more
20:28 MiroslavAnashkin joined #fuel
20:37 rmoe joined #fuel
20:41 pvbill joined #fuel
20:44 mrasskazov joined #fuel
20:48 mrasskazov1 joined #fuel
20:52 pvbill Does anyone know what's the most you can oversubscribe VMs/vCPUs when they're under little to no load on these machines? (would like to allocate hundreds at a time)
20:55 dhblaz I have a machine at 10:1 but it is painful at times.  I think the problems on this machine are more about the RAM oversubscription than the CPU.
20:55 dhblaz Can anyone tell me an easy way to re-setup the keystone table for 4.0 cluster
20:56 mrasskazov joined #fuel
21:04 xarses pvbill: I've had up to 8:1 on vCPU, as dhblaz notes, RAM is worse
21:04 xarses dhblaz: resetup?
21:05 dhblaz xarses: I'm not sure what resetup is
21:06 dhblaz My keystone database lost some if its tables when I was running health checks
21:06 xarses "Can anyone tell me an easy way to re-setup the keystone table for 4.0 cluster"
21:06 dhblaz I would rather not re deploy just to get all the keystone tenants, users, roles etc setup again
21:07 xarses from a controller try 'keystone-manage db_sync'
21:07 pvbill_ joined #fuel
21:07 xarses all of the services have some form of -manage db_sync
21:07 xarses sometimes its dbsync
21:07 dhblaz I ran that but it doesn't populate the db, it just sets it up.
21:08 xarses ahh so you're missing the endpoints and the like
21:08 dhblaz Yes
21:09 xarses it's not as discrete as you would probably want , but you can just force the controller role to re-run and it should repopulate all that
21:09 xarses are you on 4.0 or 3.2.x?
21:10 dhblaz I'm missing trust, trust_role, user, user_domain_metadata, user_group_membership, user_project_metadata
21:10 dhblaz 4.0
21:11 xarses you need to ensure that /etc/astute.yaml 'role:' is 'controller' or 'primary-controller' if its neither set it to 'controller'
21:13 xarses and then 'puppet apply /etc/puppet/manifests/site.pp'
21:15 dhblaz should I comment out all the stages except for cluster_head?
21:18 xarses it should be safe to re-run the entire role, you can try parting it out, but I can't ensure the results off the top of my head
21:19 mrasskazov joined #fuel
21:23 dhblaz Thanks xarses, running now with fingers crossed
21:23 dhblaz BTW, are upgrades on the roadmap?
21:24 xarses which kind?
21:25 xarses https://wiki.openstack.org/wiki/Fuel#Roadmap
21:27 dhblaz Thanks
21:27 dhblaz do you have a minute to talk about vlan splinters?
21:32 e0ne joined #fuel
21:35 dhblaz joined #fuel
21:38 mrasskazov joined #fuel
21:41 xarses sure
21:42 mrasskazov joined #fuel
21:45 dhblaz did you find a definative list of buggy drivers?
21:46 dhblaz and do you know if a nic passes ovs-vlan-test if it is indeed good?
21:46 dhblaz (i.e. not needing vlan splinters)
22:40 xarses if the nic passes ovs-vlan-test it should be good. As to the list of drivers its the same list of suspect drivers as before. There appears to additionally be some variance between nic rev and firmware versions besides kernel that determine if ovs vlans will work or not
22:41 xarses it's really quite a pain to test and certify that one driver works over another
22:42 xarses the centos kernel changed in 4.0 from 3.2.x so you might see the problem less even if you are on a suspect driver

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary