Tag Archives: analytics

IoT Course Week 11: Load Testing

Screen Shot 2015-10-15 at 3.07.21 PM

Last week, we dove into into the importance of incorporating and collecting analytics through your connected device, how that information helps provide business value, and played with some of the ways that information can be displayed using some pretty graphs.

This Week

This week, we’ll continue our focus on non-functional requirements and start load testing. With connected devices, if the device can’t call home to its shared services, it loses a lot of its value as a smart device. These services need to be highly reliable, but things get interesting when thousands or millions of devices decide to call home at the same time.

To load test, we’ll generate concurrent usage on system until a limit, bottleneck, unexpected behavior, or issue is discovered. This usage should model real-life usage as close as possible, so the analytics we put in place last week will be a valuable resource. In instances where we don’t have data to work with, we can build out user funnels and extrapolate based on anticipated usage. Bad things will happen if we ship thousands of products without any idea how our system will react under the load. This data will also be a useful baseline for capacity planning and system optimization experiments.

The Lampi system has two shared services that we need to put under load. One is the Django web server that handles login, and the other is the MQTT broker that handles sending messages to the lamp.

Load Testing with Locust

To test the web server we use Locust. Locust has become a LeanDog favorite due to its simple design, scalability, extensibility, and scriptability. We’ve used it to generate loads of 200,000 simultaneous users distributed across the US, Singapore, Ireland, and Brazil. These simulated users (locusts) walked through multi-page workflows at varying probabilities, modeling the end-to-end user interaction, complete with locusts dropping out of the user funnel at known decision points.

Locusts are controlled via a locustfile.py. The one below shows a user logging in and going to the home page:

from locust import HttpLocust, TaskSet, task

class UserBehavior(TaskSet):

def on_start(self):
self.login()

def login(self):
response = self.client.get("/accounts/login/?next=/")
csrftoken = response.cookies.get('csrftoken', '')
self.client.post("/accounts/login/?next=/", {"csrfmiddlewaretoken": csrftoken, "username": {{USERNAME}}, "password": {{PASSWORD}} })

@task(1)
def load_page(self):
self.client.get("/")

class WebsiteUser(HttpLocust):
task_set = UserBehavior
min_wait = 5000
max_wait = 9000

In order to run locust, we’ll need a machine outside of the system to simulate a number of devices. Locust is a python package, so it can run on most OSes. It uses a master/slave architecture so you can distribute the simulated users and allow for more and more load.

Once you install locust and start the process, you control the test via a web interface.

Screen Shot 2016-05-04 at 12.09.46 PM

Locust will aggregate the requests to a particular endpoint and provide statistics and errors for those requests.  

Screen Shot 2016-05-04 at 12.09.55 PM

Screen Shot 2016-05-04 at 12.10.03 PM

Load Testing with Malaria

To test MQTT we used a fork of Malaria. Malaria was designed to exercise MQTT brokers. Like locust, Malaria spawns a number of processes to publish MQTT messages. Unlike locust, it’s not easy to script; you have to fork it to do parametric testing or randomize data.

usage: malaria publish [-D DEVICE_ID] [-H HOST] [-p PORT] [-n MSG_COUNT] [-P PROCESSES]

Publish a stream of messages and capture statistics on their timing

optional arguments:
-D DEVICE_ID (Set the device id of the publisher)
-H HOST (MQTT host to connect to (default: localhost))
-p PORT, (Port for remote MQTT host (default: 1883))
-n MSG_COUNT (How many messages to send (default: 10))
-P PROCESSES (How many separate processes to spin up (default: 1))

By modulating MSG_COUNT and PROCESSES you can control the load being sent to the broker.

Running some Example loads
Small load: Using 1 process, send 10 messages, from device id [device_id]

loadtest$ ./malaria publish -H [broker_ip] -n 10 -P 1 -D [device_id]

Produces results similar to this:

Clientid: Aggregate stats (simple avg) for 1 processes
Message success rate: 100.00% (10/10 messages)
Message timing mean 344.51 ms
Message timing stddev 2.18 ms
Message timing min 340.89 ms
Message timing max 347.84 ms
Messages per second 4.99
Total time 14.04 secs

Large load: Using 8 processes, send 10,000 messages each, from device id [device_id]

loadtest$ ./malaria publish -H 192.168.0.42 -n 10000 -P 8 -D [device_id]

Monitoring The Broker

MQTT provides a set of topics that allow you to monitor the broker.

This command will show all the monitoring topics (note that the $ is escaped with a backslash):

cloud$ mosquitto_sub -v -t \$SYS/#

The sub topics ‘…\load...’ are of particular interest.

Gather data

Before we start testing, we should figure out what metrics we want to measure. Resources on the shared system (CPU, memory, bandwidth, file handles) are good candidates for detecting capacity issues. Focusing on the user experience (failure rate, response time, latency) will help you hone in on the issues that will incur support costs or retention problems. Building the infrastructure to gather, analyze and visualize those metrics can be a significant part of the load testing process – but those tools are also necessary to do useful operational support in production. For the class, students used sysstat, locust, mqtt and malaria to gather metrics. A production-like system might use AWS Cloudwatch, New Relic, Nagios, Cacti, Munin, or a combination of other excellent tools.

The point of load testing is to find the limits and then decide what to do about them. There will be a point where the cost to rectify the issue is greater than any immediate benefit, load testing will help you find that bar. During the class, limits of a 1000 simultaneous users for web and 5,000-10,000 MQTT messages per process were common.

Final project

For their final project two students from the class, Matthew Bentley and Andrew Mason, decided to take on some of the problems with mqtt-malaria and extend Locust to publish MQTT messages. Using Locust they were able to scale their load test infrastructure across many machines and put a broker under more stress. In their previous testing with malaria, they found the point where a single device could send no more messages (at a reasonable rate), but they could not scale malaria to determine at what point the broker would not process any additional connected devices’ messages. Through their efforts, they reached 100% CPU on the broker, pushing 1 million messages a minute to 4000 users. As a result of their work they also open sourced their contribution to locust.

IoT Course Week 10: Analytics

IoTBackground

Last week we got our feet wet with an introduction to Bluetooth Low-Energy on iOS. This week, we’ll dive into analytics, provide business value, and make some pretty graphs.

Why Analytics?

When building a new product, there are always a variety of options on the table with which to improve that product. At LeanDog, we practice a software development cycle that includes short sprints coupled with an open and honest feedback loop that provides us with the information we need to make informed decisions about where to focus our efforts and resources. This allows us to make sure that we are building the right thing the first time and minimize the amount of risk inherent in the process.

Until relatively recently, collecting feedback about a product in-use was a long process that required either direct observation or careful reading of written user reviews and complaints. Due to the complex and inconsistent nature of users, collecting strong quantitative data about a product experience can be difficult. In a now infamous incident from 2013, a New York Times journalist wrote a negative review of the Tesla Model S, only to have the car’s onboard analytics refute many of his claims. It is not uncommon for a customer to report one thing, but end up doing something entirely different, and your user experience process will need to account for these inconsistencies. One of the many ways we solve that problem is through the use of analytics platforms and reporting tools.

In addition to uncovering potential pitfalls, analytics are a powerful way for product owners, designers, and developers to understand how a product is actually used. For companies that make physical devices, this provides insights that are difficult to collect otherwise. Imagine receiving a coupon in the mail for a smart GE light bulb you love that’s nearing the end of it’s lifetime. The only way GE could possibly anticipate that your current bulb is about to go out (without calling you every day to ask how often you turned it on in the last 24 hours) is through analytics. With analytics, you get an avenue outside of sales to start to figure out which features and products your users actually love, which have problems or aren’t worth further development, and even identify disengaged users for retention campaigns.

Enter Keen IO
For this class, we will use a popular analytics platform called Keen.io. Keen is a general purpose tool, not locked into web, mobile, or embedded specifics. It has a large number of supported software development kits (SDK’s), including Ruby, iOS, Python, .NET, etc. It also offers a powerful free tier, which is perfect for the amount of traffic currently being driven on student’s LAMPi systems. Registering and sending a notification in Python is as simple as as this:

from keen.client import KeenClient

client = KeenClient(
project_id="xxxx",
write_key="yyyy",
)

client.add_event("sign_ups", {
"username": "lloyd",
"referred_by": "harry"
})

This will send an event containing the signup data to Keen’s database. Now back at LAMPi headquarters we can track those signups on a giant web dashboard:

var series = new Keen.Query(“count”, {
eventCollection: “sign_ups”,
timeframe: “previous_7_days”,
interval: “daily”
});

client.draw(series, document.getElementById(“signups”), {
chartType: “linechart”,
label: “Sign Ups”,
Title: “Sign Ups By Day”
});

image01

Keen also provides a number of ways to pull out the analytics data and do additional processing to get exactly the view we wanted. Like if we wanted to build a tree of who our top referrers are what their “network” looks like:

image00

What’s next?
Analytics can also provide a leading indicator to help model the number of users that will be pounding on your infrastructure. To learn more about how to address that issue, join us next week when we talk about load testing!