Why is scaling important?
- Because the incoming traffic will grow overtime and model will become more complex
- Scaling dimension
- Data volume
- Model parameters
Types of scaling
- Vertical
- Increased power
- Upgrade RAM/CPU/GPU/TPU
- Faster storage
- Horizontal
- More devices on the network
- Scale up as needed
- Scale down to minimum
- Generally prefer horizontal scaling to vertical
- Elasticity
- No need to go offline
- No hardware limit on a single device
Containerism
The solution to highly scalable serving infra is containerization
- Left – using virtual machine as the container requires running separate OS on top of the hypervisor
- OS does not run on hardware but on hypervisor
- Right – using docker container is a light weight environment that runs on top of docker engine
- each container has its own file system
- a docker image is a stack of layering of file system where the lower layers are read-only (a snapshot of the file system of the image that it depends on)
- OS still runs on hardware
Container Orchestration
- Mange the life cycle of container instances in production environments
- Scaling of containers
- Reliability of containers
- Containers on hot standby
- Distribute resources among containers
- Monitor health of containers
- Popular container orchestration tools
- Kubernetes
- Kubeflow – ML workflows on Kubernetes
- Makes deployments of all ML workflows portable and scalable
- Kubeflow can be used anywhere that Kubernetes is run (both on prem or cloud Kubernetes)
- Kubeflow – ML workflows on Kubernetes
- Docker Swarm
- Kubernetes