Preparing Resilient Production Release Silently with Traffic Shadowing: GoReplay
Replicating production environment for the purpose of testing not only involves lots of works but brings complexities in bringing database, queues, third party dependencies in same state and with same data. Rather, if we can plug-in a new release or service in production environment silently by mirroring traffic, we can get idea of the system behaviour and it’s impact.
If you were in any of the situation below, the process could have been more streamlined earlier by shadowing:
- App was working fine in dev/stage/uat environment but consumed lots of compute resources as soon as traffic is sent to it
- Had to rollback a release because database was bombarded(locks, high resource usage, slow response time etc.) when a new release landed in production environment
- Cannot replicate occasional bug of production environment traffic pattern in non-production environment
Traffic mirroring/shadowing is a technique to copy and send network traffic from one route to another which can be used for analysis, monitoring, or troubleshooting purposes. Normal traffic flow should not be disturbed during the process.
In Kubernetes environment, there are some of the tools which eases the process of traffic mirroring. If you are planning to move to service mesh or already using it, Istio service mesh provides traffic mirroring capability out of the box with this simple manifest
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: httpbin
spec:
parentRefs:
- group: ""
kind: Service
name: httpbin
port: 8000
rules:
- filters:
- type: RequestMirror
requestMirror:
backendRef:
name: httpbin-v2
port: 80
backendRefs:
- name: httpbin-v1
port: 80
This route rule sends 100% of the traffic to
v1
. TheRequestMirror
filter specifies that you want to mirror (i.e., also send) 100% of the same traffic to thehttpbin:v2
service. When traffic gets mirrored, the requests are sent to the mirrored service with their Host/Authority headers appended with-shadow
. For example,cluster-1
becomescluster-1-shadow
.
But this requires Istio service mesh installation which is another big shift and might not match the purpose of only traffic mirroring.
Next and probably one of the most popular tool out there is GoReplay which works both on Kubernetes and non-Kubernetes environment.
GoReplay main repo in Github(github.com/buger/goreplay) along with official site(goreplay.org) has good documentation but in this post we are going to install and use it in a Kubernetes cluster.
The official release didn’t work my system as well as had to compile docker image. So, I forked the repo with some changes and k8s definition.
The installation process is pretty simple by applying the ClusterRole, ClusterRoleBinding, ServiceAccount, a sample deployment and GoReplay as Daemonset from k8s folder.
With this simple command args on the daemonset definition, GoReplay starts showing the output of all the requests coming to nginx
deployment.
args: ["--input-raw", "k8s://goreplay/deployments/nginx:80", "--output-stdout"]
The traffic can be replayed to another new service or release just by changing the output config and so on.
"--output-http", "http://httpd-service.goreplay.svc.cluster.local:80"
One thing to notice is as we are running daemonset of GoReplay, dnsPolicy: ClusterFirstWithHostNet
must be set otherwise DNS won’t get resolved and doesn’t work as expected when we output it to internal service like httpd-service.goreplay.svc.cluster.local
in above example.
This runs GoReplay on every node in the Kubernetes cluster which consumes some resources but the traffic mirroring process gets way simpler.
That’s it for this post. I would love to hear your experience in implementing traffic mirroring service.