Zalando began moving its infrastructure from two on-premise data centers to the cloud, requiring the migration of older applications for cloud-readiness. "We decided to have a clean break," says Jacobs. "Our
Amazon Web Services infrastructure was set up like so: Every team has its own AWS account, which is completely isolated, meaning there’s no ‘lift and shift.’ You basically have to rewrite your application to make it cloud-ready even down to the persistence layer. We bravely went back to the drawing board and redid everything, first choosing Docker as a common containerization, then building the infrastructure from there."
The company decided to hold off on orchestration at the beginning, but as teams were migrated to AWS, "we saw the pain teams were having with infrastructure and cloud formation on AWS," says Jacobs.
Zalandos 200+ autonomous engineering teams decide what technologies to use and could operate their own applications using their own AWS accounts. This setup proved to be a compliance challenge. Even with strict rules-of-play and automated compliance checks in place, engineering teams and IT-compliance were overburdened addressing compliance issues. "Violations appear for non-compliant behavior, which we detect when scanning the cloud infrastructure," says Jacobs. "Everything is possible and nothing enforced, so you have to live with violations (and resolve them) instead of preventing the error in the first place. This means overhead for teams—and overhead for compliance and operations. It also takes time to spin up new EC2 instances on AWS, which affects our deployment velocity."
The team realized they needed to "leverage the value you get from cluster management," says Jacobs. When they first looked at Platform as a Service (PaaS) options in 2015, the market was fragmented; but "now there seems to be a clear winner. It seemed like a good bet to go with Kubernetes."
The transition to Kubernetes started in 2016 during Zalando’s
Hack Week where participants deployed their projects to a Kubernetes cluster. From there 60 members of the tech infrastructure department were on-boarded - and then engineering teams were brought on one at a time. "We always start by talking with them and make sure everyone’s expectations are clear," says Jacobs. "Then we conduct some Kubernetes training, which is mostly training for our CI/CD setup, because the user interface for our users is primarily through the CI/CD system. But they have to know fundamental Kubernetes concepts and the API. This is followed by a weekly sync with each team to check their progress. Once they have something in production, we want to see if everything is fine on top of what we can improve."