Name: Hardware management ramdisk
Start: 2013-11-07T13:50:00+0800
End: 2013-11-07T14:30:00+0800

Please note: This schedule is for OpenStack Active Technical Contributors participating in the Icehouse Design Summit sessions in Hong Kong. These are working sessions to determine the roadmap of the Icehouse release and make decisions across the project. To see the full OpenStack Summit schedule, including presentations, panels and workshops, go to http://openstacksummitnovember2013.sched.org.

Back To Schedule

Hardware management ramdisk

Ironic has, broadly speaking, two means to manage hardware:
* by booting a ramdisk with specialized functionality and performing actions locally
* over the network via OOB management tools

The REST and RPC APIs should expose a unified means to manage hardware, regardless of which driver is used, or whether that driver affects the nodes via the OOB mgmt interface or via local operations in a custom ramdisk.

We will discuss the API changes needed to expose various hardware management operations and the creation of a reference implementation, relying on a bootable ramdisk, for them.

* How do we expose these functions through the REST API?
* Who are the consumers of these functions? Cloud admins? Nova? Heat?
* Do we need an agent in the ramdisk with its own API?
* Do we need distinct ramdisks for different operations (eg, update-firmware, build-raid, erase-disks, etc)?
* How do we get logs back from the ramdisk, and how do we report errors back to the user?

This session will include the following subject(s):

Data protection for bare metal nodes:

The problem
------------------
There are several cases when private data might be accessible by third party users:
- A new user of a node provisions it with a smaller partition size than the previous one so some data might still remain on the volume.
- Previous user exceeded the size of physical memory and there might be some data on swap partition.

Possible solutions
--------------------------
- Build a special undeploy image and use it for either
- Securely erasing the volume on the node side
- Exporting a volume to manager and perform erasing on the manager side
- Create a separate boot configuration on the node that loads a kernel and a ramdisk with undeploy scripts in it

Food For Thought
--------------------------
- Should wiping be a part of deploying or undeploying?
- Should we wipe all nodes or wipe them on-demand?
- Wiping all nodes might be ot required for everyone
- Securely wiping a node requires a lot of time

Related bug report:
https://bugs.launchpad.net/ironic/+bug/1174153

(Session proposed by Roman Prykhodchenko)

Communicating with the nodes:

It would be nice to formalize what and how a node under Ironic's control will communicate with Ironic. Currently the nodes only communication is a signal to start the deploy and return signal that the deploy is completed. Ironic should support a dynamic conversation between itself and the nodes it is controlling.

Ironic will need to support several new areas of communication:
* All node actions
* will need to send basic logging back to Ironic
* should be interruptable
* deployment (if done by an agent on the node)
* nodes will need a way to communicate with Ironic to get the image to be deployed
* Ironic will need to communicate RAID setup and disk partition information to nodes.
* hardware & firmware discovery
* nodes will need a way to send information about their hardware and current firmware revisions to Ironic
* do nodes need to be able (re)discover replaced HW such as a nic?
* firmware update
* Ironic will need a way to push firmware updates to a node
* secure erase
* nodes will need to communicate progress back to Ironic
* Ironic will need to communicate which devices and how many cycles to erase
* burn in
* Nodes performing a "burn-in" will need to communicate any failures back to Ironic
* Ironic will need a way to specify which burn-in tests to run
* Ironic will need to specify how long / how many test to run

-----------------

Open questions:
* Some of the operations above (eg. discovery, RAID setup, firmware) may be performed via multiple vectors (eg, IPMI)
* Some may be best served by borrowing from other OpenStack services (eg, cinder for RAID vol spec)
* Not all deployers will want all of these features, and some may use vendor extensions to accelerate specific features (eg, firmware mgmt). How do we support this mixture?

Etherpad:
https://etherpad.openstack.org/p/IcehouseIronicNodeCommunication

(Session proposed by Chris Krelle)

Thursday November 7, 2013 1:50pm - 2:30pm HKT
AWE Level 2, Room 201A

Ironic

Icehouse Design Summit

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Attendees (0)