Kaoru
Kohashigawa

Weapon of Choice


siege

![Evan Kirby](https://images.unsplash.com/photo-1495435798646-a289417ece44?dpr=2&auto=format&fit=crop&w=400&q=80&cs=tinysrgb&crop=&bg=)<br> [Photo by Evan Kirby](http://unsplash.com/@evankirby2?utm_medium=referral&amp;utm_campaign=photographer-credit&amp;utm_content=creditBadge) At Sift I work on a team responsible for the uptime of one of our core products: Workflows. Unfortunately the design of the Workflows engine is complex, difficult to debug, and unstable. GC death spirals are a common outage problem. Leaders would disappear and reappear for no reason resulting in leadership discrepancies forcing us to restart the entire cluster. Most of all it was difficult to onboard new engineerings. Few of us are comfortable debugging any on-call issues most will ping us whenever there was a fire. Most of the problems stem from supporting an in-memory database and keeping state on three separate instances in sync. Moving the database to an external service simplifies a lot of things but latency is the main concern. Some believe under high load, the database is unable to keep up with the product needs. I asked a few of my mentors for their opinions, most sense latency wouldn't be an issue but I have to get hard numbers, I can't convince people with feelings and theory. ### Getting the Requirements Luckily for me Sift has metrics and standards in place for how fast a Workflows engine needs to be. All I have to do is production like requests to the system and make sure our metrics do not worsen. Sending a bunch of requests via curl and the up arrow isn't the best way to spend my afternoon, I need a tool. But first I need to define what production data looks like. 1. **Uniqueness**: Requests must come in from 20+ different customers to avoid creating [hot spots on HBase](https://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#_hotspotting). Our customer ids are part of the url, therefore the tool should be able to send requests to different urls. 2. **Longevity**: Traffic must be spread over a period of time to simulate days of load, not just seconds. 3. **Concurrent**: Requests must be fired concurrently to simulate multiple clients. 4. **Volume**: Controlling the number of transactions per second will allow me to get a better idea of when the new designs hits a limit. 5. **JSON**: Requests must include a JSON payload, pretty typical for any service. ### Apache Benchmark (ab) Apache Benchmark is really good at getting top performance measurements. From my understanding it will create an configurable number of connections and try to send a configurable number of requests through those connections, sending 1 request after another. That's great for testing your application for max performance and max load. It will then spit out a nifty output file which [R can plot](https://www.r-project.org/about.html). This allows you to quickly digest your results and share it with your team. Here's out it met my criteria: 1. **Uniqueness [NOPE]**: Unfortunately ab out the gate won't fire to multiple endpoints. 2. **Longevity [SORTA]**: The `-s` option will "timeout" once the test time passes a limit. You could theoretically say send a trillion requests within 10 minutes and if your application doesn't handle the trillion requests, the benchmark run will terminate. 3. **Concurrent [YEP]**: `-c` will take a number allowing you configure the number of concurrent requests. 4. **Volume [SORTA]**: There is a `-n` options which will allow you to specify the total number of requests the test should fire, there is no option however to throttle the requests being fired. I.e. wait 200ms before firing the second request. You can achieve different levels of throughput by reducing and increasing the concurrency level, but no way to throttle requests. 5. **JSON [YEP]**: You'll have to configure the appropriate headers, but you are able to specify a text file to be the body of the request allowing you to send any sort of data. ### Siege Is really nifty tool that aims to mimic user behavior. There are a ton of options and it will even download resources on an HTML page shedding light on how your servers will behave under load. Unfortunately the output isn't as helpful and clear as ab. Nor is there any support for graphing to R. I suppose you can grab the entire output of the tests and feed it to something (**HACK PROJECT IDEA**?!?). How does it stand up to the criteria? 1. **Uniqueness [YEP]**: You can give `siege` a text file with a list of urls and it will randomly select one when it fires a request. Unfortunately there is no way to programmatically create the urls on the fly, forcing you to pre-list all possible urls. 2. **Longevity [YEP]**: Unlike ab, siege will fire requests until the configured time expires. 3. **Concurrent [YEP]**: `-c` allows you to configured the number of open concurrent connections. 4. **Volume [SORTA]**: the `-d` option will pause for a number of seconds between 1 and the number passed. This allows you to control the throughput of the system. Given 30 open connections, you're total transactions per second (TPS) will be around 30. This allows you to not overwhelm the service and control the throughput. I'm not entirely sure if i can handle fractions of a second though. 5. **JSON [YEP]**: In addition to giving siege a file to use as the body, you can even give it a list of urls with different files allowing you to mimic different request bodies. Neat right? ### Conclusion Siege fits the bill for my test case, it does take a little bit more time to setup because it gives you so many knobs. With all of the options siege allows you to get pretty close to production like traffic. In the end I was able to use siege to build a new design and convince to my team it would work in the wild. The design is live in production and has yet to fail us, knock on wood. ### Word of Caution If you're using these tools to benchmark user latency (requests coming from a browser) keep in mind where you're running the tools from. For example if your application is hosted in Virginia, latency will be terrible if you run tests on a box in Australia, unless most of your users are from Australia, then why are you hosting your application in Virginia? You can find more pitfalls and sharp edges to avoid in this great [Sonassi blog post](https://www.sonassi.com/blog/magento-kb/why-siege-isnt-an-accurate-test-tool-for-magento-performance). ## Reference: - Siege Manual<br> [https://www.joedog.org/siege-manual/](https://www.joedog.org/siege-manual/) - Apache Benchmark<br> [https://httpd.apache.org/docs/2.4/programs/ab.html](https://httpd.apache.org/docs/2.4/programs/ab.html) - Sonassi Blog Post<br> [https://www.sonassi.com/blog/magento-kb/why-siege-isnt-an-accurate-test-tool-for-magento-performance](https://www.sonassi.com/blog/magento-kb/why-siege-isnt-an-accurate-test-tool-for-magento-performance)