Goal
Migrate nodes in a DCOS system to a Kubernetes system without downtime. This requires migrating nodes and apps in turns. We alternate between a:
nodePlan
: List of nodes that can be migrated from DCOS to Kubernetes.appPlan
: List of apps that can be migrated from DCOS to Kubernetes.
Challenge
- We must ensure that apps always have enough CPU and Memory to run where they are allocated ( be it DCOS or Kubernetes)
Process
We assume that we have a list of all nodes which we will be iterating over:
Node Plan Construction
stateDiagram-v2
antnp: Add Node to Node Plan
atnwlitn?: Are there no workloads left in the node?
can: Choose a Node
lf?: Is the loop finished?
cwmthcuwttnwtmfc?: Can we move the highest CPU usage workload to the node with the most free CPU
inpe?: Is Node Plan Empty?
mtw: Move the workload
onbnof: Order Nodes By Number of Workloads
onp: Output Node Plan
sap: Start App Plan
snp: Save Node Plan
uawcitn: Undo all work changes in this node
state is_node_without_workloads <<choice>>
state is_loop_finished <<choice>>
state is_node_plan_empty <<choice>>
state is_workload_move_possible <<choice>>
[*] --> onbnof
onbnof --> lf?
lf? --> is_loop_finished
is_loop_finished --> onp: Yes
is_loop_finished --> can: No
cwmthcuwttnwtmfc? --> is_workload_move_possible
is_workload_move_possible --> mtw: Yes
is_workload_move_possible --> uawcitn: No
uawcitn --> lf?
mtw --> atnwlitn?
onp --> inpe?
can --> atnwlitn?
atnwlitn? --> is_node_without_workloads
is_node_without_workloads --> antnp: Yes
is_node_without_workloads --> cwmthcuwttnwtmfc?: No
antnp --> lf?
inpe? --> is_node_plan_empty
is_node_plan_empty --> snp: Yes
is_node_plan_empty --> sap: No
snp --> [*]
App Plan Construction
stateDiagram-v2
aatap: Add App to App Plan
caa: Choose an App
dkhertra?: Does Kubernetes have enough resources to run app?
atatio?: Are there more apps to iterate over?
iape?: Is the App Plan Empty?
oabru: Order Apps by resource usage
sap: Save App Plan
snp: Start Node Plan
state is_loop_finished <<choice>>
state is_kubernetes_ready_to_run_app <<choice>>
state is_app_plan_empty <<choice>>
[*] --> oabru
oabru --> atatio?
atatio? --> is_loop_finished
is_loop_finished --> iape?: No
is_loop_finished --> caa: Yes
caa --> dkhertra?
dkhertra? --> is_kubernetes_ready_to_run_app
is_kubernetes_ready_to_run_app --> aatap: Yes
is_kubernetes_ready_to_run_app --> atatio?: No
aatap --> atatio?
iape? --> is_app_plan_empty
is_app_plan_empty --> sap: No
is_app_plan_empty --> snp: Yes
sap --> [*]
Done with Ricardo Pereira Torres da Costa.
Further work:
- Build a metric for the quality of a path to migration
- Build an heuristic for how far a path is from being a complete migration
- Use the 2 above together with the A* algorithm to give a full complete solution