環境準備google出品在國內都存在牆的問題,而kubeflow作為雲原生的機器學習套件對團隊的幫助很大,對於無翻牆條件的團隊,基於國內映象搭建kubeflow可以幫助大家解決不少麻煩,這裡給大家提供一套基於國內阿里雲映象的kubeflow 0.6的安裝方案。
kubeflow 為環境要求比較高,看官方要求:at least one worker node with a minimum of:
4 CPU50 GB storage12 GB memory當然,沒達到也能安裝,不過在後面使用中會出現資源問題,因為這是整包安裝方案。
一個已經安裝好的kubernetes叢集,這裡我採用的是rancher安裝的叢集。
sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher
這裡我選擇的是k8s的1.14版本,kubeflow和k8s之間的版本相容可以檢視官網說明,這裡我的kubeflow採用了0.6版本。
也可以直接建立阿里雲kubernetes(記得需要選擇1.14版本):
如果直接想安裝可以直接跳到最後kubeflow一鍵安裝部分
kustomize安裝下載kustomize檔案官方的教程是用 kfclt 安裝的,kfclt 本質上是使用了 kustomize 來安裝,因此這裡我直接下載 kustomize 檔案,通過修改映象的方式安裝。
官方kustomize檔案下載地址
git clone https://github.com/kubeflow/manifestscd manifestsgit checkout v0.6-branchcd <target>/basekubectl kustomize . | tee <output file>
檔案比較多,可以用指令碼分別匯出,也可以用 kfctl 命令生成kfctl generate all -V:
kustomize/├── ambassador.yaml├── api-service.yaml├── argo.yaml├── centraldashboard.yaml├── jupyter-web-app.yaml├── katib.yaml├── metacontroller.yaml├── minio.yaml├── mysql.yaml├── notebook-controller.yaml├── persistent-agent.yaml├── pipelines-runner.yaml├── pipelines-ui.yaml├── pipelines-viewer.yaml├── pytorch-operator.yaml├── scheduledworkflow.yaml├── tensorboard.yaml└── tf-job-operator.yaml
ambassador 微服務閘道器argo 用於任務工作流編排centraldashboard kubeflow的dashboard看板頁面tf-job-operator 深度學習框架引擎,一個基於tensorflow構建的CRD,資源型別kind為TFJobkatib 超引數伺服器
修改kustomize檔案修改kustomize映象修改映象:
grc_image = ["gcr.io/kubeflow-images-public/ingress-setup:latest","gcr.io/kubeflow-images-public/admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c","gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta","gcr.io/kubeflow-images-public/centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59","gcr.io/kubeflow-images-public/jupyter-web-app:9419d4d","gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager-rest:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.6.0-rc.0","gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.6.0-rc.0","gcr.io/kubeflow-images-public/metadata:v0.1.8","gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8","gcr.io/ml-pipeline/api-server:0.1.23","gcr.io/ml-pipeline/persistenceagent:0.1.23","gcr.io/ml-pipeline/scheduledworkflow:0.1.23","gcr.io/ml-pipeline/frontend:0.1.23","gcr.io/ml-pipeline/viewer-crd-controller:0.1.23","gcr.io/kubeflow-images-public/notebook-controller:v20190603-v0-175-geeca4530-e3b0c4","gcr.io/kubeflow-images-public/profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e","gcr.io/kubeflow-images-public/kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4","gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-rc.0","gcr.io/google_containers/spartakus-amd64:v1.1.0","gcr.io/kubeflow-images-public/tf_operator:v0.6.0.rc0","gcr.io/arrikto/kubeflow/oidc-authservice:v0.2"]doc_image = ["registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.ingress-setup:latest","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kubernetes-sigs.application:1.0-beta","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.jupyter-web-app:9419d4d","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-controller:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager-rest:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-bayesianoptimization:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-grid:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-hyperband:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-nasrl:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-random:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-ui:v0.6.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata:v0.1.8","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata-frontend:v0.1.8","registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.api-server:0.1.23","registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.persistenceagent:0.1.23","registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.scheduledworkflow:0.1.23","registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.frontend:0.1.23","registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.viewer-crd-controller:0.1.23","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.notebook-controller:v20190603-v0-175-geeca4530-e3b0c4","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.pytorch-operator:v1.0.0-rc.0","registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0","registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.tf_operator:v0.6.0.rc0","registry.cn-shenzhen.aliyuncs.com/shikanon/arrikto.kubeflow.oidc-authservice:v0.2"]
修改PVC,使用動態儲存修改pvc儲存,採用local-path-provisioner動態分配PV。
安裝local-path-provisioner:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
如果想直接在kubeflow中使用,還需要將StorageClass改為預設儲存:
...apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: local-path annotations: #新增為預設StorageClass storageclass.beta.kubernetes.io/is-default-class: "true"provisioner: rancher.io/local-pathvolumeBindingMode: WaitForFirstConsumerreclaimPolicy: Delete...
完成後可以建一個PVC試試:
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: local-path-pvc namespace: defaultspec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi
注:如果沒有設為預設storageclass需要在PVC加入storageClassName: local-path進行繫結
一鍵安裝這裡我製作了一個一鍵啟動的國內映象版kubeflow專案:https://github.com/shikanon/kubeflow-manifests
https://developer.aliyun.com/article/740721?spm=a2c6h.12873581.0.dArticle740721.1ba61219pe9YnP&groupCode=mvp