Goal

Migrate nodes in a DCOS system to a Kubernetes system without downtime. This requires migrating nodes and apps in turns. We alternate between a:

  • nodePlan: List of nodes that can be migrated from DCOS to Kubernetes.
  • appPlan: List of apps that can be migrated from DCOS to Kubernetes.

Challenge

  • We must ensure that apps always have enough CPU and Memory to run where they are allocated ( be it DCOS or Kubernetes)

Process

We assume that we have a list of all nodes which we will be iterating over:

Node Plan Construction

stateDiagram-v2
	antnp: Add Node to Node Plan
	atnwlitn?: Are there no workloads left in the node?
	can: Choose a Node
	lf?: Is the loop finished?
	cwmthcuwttnwtmfc?: Can we move the highest CPU usage workload to the node with the most free CPU
	inpe?: Is Node Plan Empty?
	mtw: Move the workload
	onbnof: Order Nodes By Number of Workloads
	onp: Output Node Plan
	sap: Start App Plan
	snp: Save Node Plan
	uawcitn: Undo all work changes in this node

	state is_node_without_workloads <<choice>>
	state is_loop_finished <<choice>>
	state is_node_plan_empty <<choice>>
	state is_workload_move_possible <<choice>>

	[*] --> onbnof
	onbnof --> lf?

	lf? --> is_loop_finished
	is_loop_finished --> onp: Yes
	is_loop_finished --> can: No

	cwmthcuwttnwtmfc? --> is_workload_move_possible
	is_workload_move_possible --> mtw: Yes
	is_workload_move_possible --> uawcitn: No
	
	uawcitn --> lf?
	mtw --> atnwlitn?
	onp --> inpe?

	can --> atnwlitn?
	atnwlitn? --> is_node_without_workloads
	is_node_without_workloads --> antnp: Yes
	is_node_without_workloads --> cwmthcuwttnwtmfc?: No

	antnp --> lf?

	inpe? --> is_node_plan_empty
	is_node_plan_empty --> snp: Yes
	is_node_plan_empty --> sap: No

    snp --> [*]

App Plan Construction

stateDiagram-v2
	aatap: Add App to App Plan
	caa: Choose an App
	dkhertra?: Does Kubernetes have enough resources to run app?
	atatio?: Are there more apps to iterate over?
	iape?: Is the App Plan Empty?
	oabru: Order Apps by resource usage
	sap: Save App Plan
	snp: Start Node Plan

	state is_loop_finished <<choice>>
	state is_kubernetes_ready_to_run_app <<choice>>
	state is_app_plan_empty <<choice>>

	[*] --> oabru 
	
	oabru --> atatio?
	atatio? --> is_loop_finished
	is_loop_finished --> iape?: No
	is_loop_finished --> caa: Yes

	caa --> dkhertra?
	dkhertra? --> is_kubernetes_ready_to_run_app
	is_kubernetes_ready_to_run_app --> aatap: Yes
	is_kubernetes_ready_to_run_app --> atatio?: No

	aatap --> atatio?

	iape? --> is_app_plan_empty
	is_app_plan_empty --> sap: No
	is_app_plan_empty --> snp: Yes

	sap --> [*]

Done with Ricardo Pereira Torres da Costa.

Further work:

  • Build a metric for the quality of a path to migration
  • Build an heuristic for how far a path is from being a complete migration
  • Use the 2 above together with the A* algorithm to give a full complete solution