Friday, May 04, 2018
Announcing Kubeflow 0.1
Since Last We Met
Since the initial announcement of Kubeflow at the last KubeCon+CloudNativeCon, we have been both surprised and delighted by the excitement for building great ML stacks for Kubernetes. In just over five months, the Kubeflow project now has:
- 70+ contributors
- 20+ contributing organizations
- 15 repositories
- 3100+ GitHub stars
- 700+ commits
and already is among the top 2% of GitHub projects ever.
People are excited to chat about Kubeflow as well! The Kubeflow community has also held meetups, talks and public sessions all around the world with thousands of attendees. With all this help, we’ve started to make substantial in every step of ML, from building your first model all the way to building a production-ready, high-scale deployments. At the end of the day, our mission remains the same: we want to let data scientists and software engineers focus on the things they do well by giving them an easy-to-use, portable and scalable ML stack.
Introducing Kubeflow 0.1
Today, we’re proud to announce the availability of Kubeflow 0.1, which provides a minimal set of packages to begin developing, training and deploying ML. In just a few commands, you can get:
- Jupyter Hub - for collaborative & interactive training
- A TensorFlow Training Controller with native distributed training
- A TensorFlow Serving for hosting
- Argo for workflows
- SeldonCore for complex inference and non TF models
- Ambassador for Reverse Proxy
- Wiring to make it work on any Kubernetes anywhere
To get started, it’s just as easy as it always has been:
# Create a namespace for kubeflow deployment
NAMESPACE=kubeflow
kubectl create namespace ${NAMESPACE}
VERSION=v0.1.3
# Initialize a ksonnet app. Set the namespace for it's default environment.
APP_NAME=my-kubeflow
ks init ${APP_NAME}
cd ${APP_NAME}
ks env set default --namespace ${NAMESPACE}
# Install Kubeflow components
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}
# Create templates for core components
ks generate kubeflow-core kubeflow-core
# Deploy Kubeflow
ks apply default -c kubeflow-core
And thats it! JupyterHub is deployed so we can now use Jupyter to begin developing models. Once we have python code to build our model we can build a docker image and train our model using our TFJob operator by running commands like the following:
ks generate tf-job my-tf-job --name=my-tf-job --image=gcr.io/my/image:latest
ks apply default -c my-tf-job
We could then deploy the model by doing
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME}
ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}
ks apply ${ENV} -c ${MODEL_COMPONENT}
Within just a few commands, data scientists and software engineers can now create even complicated ML solutions and focus on what they do best: answering business critical questions.
Community Contributions
It’d be impossible to have gotten where we are without enormous help from everyone in the community. Some specific contributions that we want to highlight include:
- Argo for managing ML workflows
- Caffe2 Operator for running Caffe2 jobs
- Horovod & OpenMPI for improved distributed training performance of TensorFlow
- Identity Aware Proxy, which enables using security your services with identities, rather than VPNs and Firewalls
- Katib for hyperparameter tuning
- Kubernetes volume controller which provides basic volume and data management using volumes and volume sources in a Kubernetes cluster.
- Kubebench for benchmarking of HW and ML stacks
- Pachyderm for managing complex data pipelines
- PyTorch operator for running PyTorch jobs
- Seldon Core for running complex model deployments and non-TensorFlow serving
It’s difficult to overstate how much the community has helped bring all these projects (and more) to fruition. Just a few of the contributing companies include: Alibaba Cloud, Ant Financial, Caicloud, Canonical, Cisco, Datawire, Dell, Github, Google, Heptio, Huawei, Intel, Microsoft, Momenta, One Convergence, Pachyderm, Project Jupyter, Red Hat, Seldon, Uber and Weaveworks.
Learning More
If you’d like to try out Kubeflow, we have a number of options for you:
- You can use sample walkthroughs hosted on Katacoda
- You can follow a guided tutorial with existing models from the examples repository. These include the Github Issue Summarization, MNIST and Reinforcement Learning with Agents.
- You can start a cluster on your own and try your own model. Any Kubernetes conformant cluster will support Kubeflow including those from contributors Caicloud, Canonical, Google, Heptio, Mesosphere, Microsoft, IBM, Red Hat/Openshift and Weaveworks.
There were also a number of sessions at KubeCon + CloudNativeCon EU 2018 covering Kubeflow. The links to the talks are here; the associated videos will be posted in the coming days.
Wednesday, May 2:
Thursday, May 3:
- Kubeflow Deep Dive - Jeremy Lewi, Google
- Write Once, Train & Predict Everywhere: Portable ML Stacks with Kubeflow - Jeremy Lewi, Google & Stephan Fabel, Canonical
- Compliant Data Management and Machine Learning on Kubernetes - Daniel Whitenack, Pachyderm
- Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun & Jörg Schad, Mesosphere
Friday, May 4:
What’s Next?
Our next major release will be 0.2 coming this summer. In it, we expect to land the following new features:
- Simplified setup via a bootstrap container
- Improved accelerator integration
- Support for more ML frameworks, e.g., Spark ML, XGBoost, sklearn
- Autoscaled TF Serving
- Programmatic data transforms, e.g., tf.transform
But the most important feature is the one we haven’t heard yet. Please tell us! Some options for making your voice heard include:
- The Kubeflow Slack channel
- The Kubeflow-discuss email list
- The Kubeflow twitter account
- Our weekly community meeting
- Please download and run kubeflow, and submit bugs!
Thank you for all your support so far! Jeremy Lewi & David Aronchick Google