Comparing Response Data Extractors

Take a look at a comparison of the performance of various response data extractors in JMeter in this experiment designed to test them.

We decided to write this post as a reaction to a great comparison of JMeter response data extractors by Vinoth Selvaraj aka Test Automation Guru.

We tried to repeat his experiment using SmartMeter.io instead of JMeter and adding other extractors, namely Boundary Body extractor and JSON extractor.

As Vinoth shows in his experiment, Regular Expression Extractor has the best performance from extractors used by him. This extractor is also the most powerful thanks to usage of regular expressions for searching in text.

Boundary Body extractor is not so powerful as regex extractor. It is, however, much easier to use, especially if you are not a regex fan. Due to its simplicity it should be faster than Regular Expression Extractor.

Test Plan

We use a similar test plan as Vinoth. It will contain one Thread Group and one Dummy Sampler in it. Response will be dummy XML with the Response Data taken from W3 schools.

As in the previous test, Latency and Response times are turned off and there are no timers. You can see the whole stack in GitHub repository.

Test data

For JSON extractor tests, we transferred XML to JSON. Response data in both XML and JSON formats can be found in repository.

Our XML looks like this:

We will extract the first TITLE in CD list. So the expected value will be “Empire Burlesque”.

This test plan will be same for all tests, the only difference will be in applied extractors.

Testing environment

We used the Amazon t2.micro instance with ubuntu as the operating system (that was exactly AMI-ID = ami-af455dc9) for this testing.

That way we get the environment quickly. This will also allow us to repeat the test at any time under the same conditions, and to eliminate the some influences like desktop applications and so.

We installed SmartMeter.io version 1.4 on this machine. We have installed all necessary plugins for Dummy Sampler and extractors, too.

Results

After each finished test, we generated a report. These data were collected for comparison:

 

Name of extractor Count Throughput %
No extractor 3154374 52587.8 100
Boundary body 1811957 30212.4 57
Regex 1438485 23983.1 46
JSON path 815584 13597.4 26
JSON 761193 12691.0 24
jQuery 216559 3610.6 7
Xpath 41963 699.5 1

This graph captures the Count values.

Conclusion

Again as in the Vinoth’s article we will focus on Count (no. of samples sent in 60 seconds) and Rate (throughput). You can clearly see that extractor affects performance of the test.

Test with no post processors had roughly 3 milion requests. Adding post processors affects CPU and memory utilization.

 

According to the results, we should use Boundary Body extractor as often as possible. Not only is it the fastest extractor, but its use is very simple. Just find strings, that surround your “needle” in the “haystack”. For example, if we have XML from our test

<CATALOG>
   
<CD>
        
<TITLE>Empire Burlesque</TITLE>
        
<ARTIST>Bob Dylan</ARTIST>
        
<COUNTRY>USA</COUNTRY>
        
<COMPANY>Columbia</COMPANY>
        
<PRICE>10.90</PRICE>
        
<YEAR>1985</YEAR>
    
</CD>
<CATALOG>

and we want to find the contents of the <TITLE> tag, just define Boundary Body extractor as follows:

For more information on how to use Boundary Body extractor, see the documentation.

Even though Boundary Body extractor is fast and efficient, it is unfortunately not omnipotent. One of its problems is that it does not support non-ASCII characters.

It is also not always possible to search in the text so easily. Sometimes there is nothing more than a good old regex.

Therefore, a suitable strategy should be to use Boundary Body extractor first and, if necessary, use Regular Expression Extractor.