Monday, July 22, 2013

Manual Compile of libvirt to Resolve CloudStack and Ceph RBD Storage Issue

I installed CloudStack 4.1.0 on Ubuntu 12.04.2 LTS (precise) server. Initially I wanted to use Ubuntu 13.04 (raring) but CloudStack only provides packages repository for Ubuntu 12.04. I used KVM as the hypervisor hosts, also running Ubuntu 12.04.2 LTS, and use Ceph RBD (RADOS Block Device) for the primary storage for CloudStack.

The default libvirt version on Ubuntu 12.04 doesn’t support Ceph RBD as primary storage. I followed this instruction from Wido to get libvirt version 1.0.2, which can support RBD storage pool support. However, I had an issue whereby the libvirt is reporting wrong RBD storage pool’s disk usage / allocation information.

root@hv-kvm-02:~# virsh pool-info bab81ce8-d53f-3a7d-b8f6-841702f65c89
Name:           bab81ce8-d53f-3a7d-b8f6-841702f65c89
UUID:           bab81ce8-d53f-3a7d-b8f6-841702f65c89
State:          running
Persistent:     no
Autostart:      no
Capacity:       5.47 TiB
Allocation:     34819.02 TiB
Available:      5.47 TiB

As a result, VM instance creation failed because the RBD storage pool is reported as having insufficient disk space and CloudStack wasn’t able to find a suitable /available storage pool.

2013-07-15 11:15:28,313 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool: 208 for volume allocation [Vol[227|vm=225|ROOT]], maxSize : 15828044742656, totalAllocatedSize : 1769538048, askingSize : 8589934592, allocated disable threshold: 0.85
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Checking if storage pool is suitable, name: sc-image ,poolId: 209
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is localStorageAllocationNeeded? false
2013-07-15 11:15:28,313 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is storage pool shared? true
2013-07-15 11:15:28,317 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool 209 for storage, totalSize: 6013522722816, usedBytes: 38283921137336466, usedPct: 6366.305226067051, disable threshold: 0.85
2013-07-15 11:15:28,317 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Insufficient space on pool: 209 since its usage percentage: 6366.305226067051 has crossed the pool.storage.capacity.disablethreshold: 0.85
2013-07-15 11:15:28,317 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator returning 1 suitable storage pools
2013-07-15 11:15:28,317 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Checking suitable pools for volume (Id, Type): (228,DATADISK)
2013-07-15 11:15:28,317 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) We need to allocate new storagepool for this volume
2013-07-15 11:15:28,319 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Calling StoragePoolAllocators to find suitable pools
2013-07-15 11:15:28,319 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) Looking for pools in dc: 6 pod:6 cluster:6 having tags:[rbd]
2013-07-15 11:15:28,322 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator has 1 pools to check for allocation
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Checking if storage pool is suitable, name: sc-image ,poolId: 209
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is localStorageAllocationNeeded? false
2013-07-15 11:15:28,322 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-3:job-168) Is storage pool shared? true
2013-07-15 11:15:28,326 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Checking pool 209 for storage, totalSize: 6013522722816, usedBytes: 38283921137336466, usedPct: 6366.305226067051, disable threshold: 0.85
2013-07-15 11:15:28,326 DEBUG [cloud.storage.StorageManagerImpl] (Job-Executor-3:job-168) Insufficient space on pool: 209 since its usage percentage: 6366.305226067051 has crossed the pool.storage.capacity.disablethreshold: 0.85
2013-07-15 11:15:28,326 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-3:job-168) FirstFitStoragePoolAllocator returning 0 suitable storage pools
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable pools found for volume: Vol[228|vm=225|DATADISK] under cluster: 6
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable pools found
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) No suitable storagePools found under this Cluster: 6
2013-07-15 11:15:28,326 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-3:job-168) Could not find suitable Deployment Destination for this VM under any clusters, returning.
2013-07-15 11:15:28,332 DEBUG [cloud.vm.UserVmManagerImpl] (Job-Executor-3:job-168) Destroying vm VM[User|Indra-Test-3] as it failed to create on Host with Id:null
2013-07-15 11:15:28,498 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-3:job-168) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2013-07-15 11:15:29,125 INFO [user.vm.DeployVMCmd] (Job-Executor-3:job-168) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|Indra-Test-3]Scope=interface com.cloud.dc.DataCenter; id=6

After consulting the CloudStack users’ mailing list and logging a bug report on Apache’s JIRA here without any success, I managed to resolve the problem by compiling and installing the latest version of libvirt. This is how I did it on my KVM hypervisor hosts running on Ubuntu 12.04.2 LTS servers:

1. Download the latest libvirt version (1.1.0) from libvirt’s FTP site, and extract it:

ftp://libvirt.org/libvirt/libvirt-1.1.0.tar.gz

2. Install the required packages for compiling libvirt:

apt-get install librbd-dev
apt-get install libpciaccess-dev

3. Compile libvirt with RBD storage support, and set the required prefixes to overwrite the existing default libvirt on the Ubuntu server:

./autogen.sh --prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --with-storage-rbd

4. After the compilation has been completed, check the logs visually to confirm that RBD storage support is enabled, then do the installation:

make
make install

5. Restart the KVM hosts after the installation is done, and then verify that the latest version of libvirt has been installed:

libvirtd --version
virsh --version

The command “virsh pool-info” is now showing the correct “ allocation” amount:

root@hv-kvm-02:~# virsh pool-info d433809b-01ea-3947-ba0f-48077244e4d6
Name: d433809b-01ea-3947-ba0f-48077244e4d6
UUID: d433809b-01ea-3947-ba0f-48077244e4d6
State: running
Persistent: no
Autostart: no
Capacity: 5.47 TiB
Allocation: 328.00 B
Available: 5.47 TiB

CloudStack will then be able to utilise the RBD storage pool when creating VM instances.