Big Jasper Reports with custom XML datasource

While working on one of my projects we were faced with the problem of creating a report with a big amount of data to show on multiple Excel tabs (about 50000 entries grouped by different criteria’s). We had a couple of requirements that lead us to choose Jasper Reports as our report generation engine. Other requirements lead us to use XML as data source – e.g. to generate the report on the fly without wasting hard disk space for different languages.
Really quick we figured out that the default XML data source implementation with XPath does not work out for us, which was not really a big surprise in regard of how XPath is working – the first test has been terminated after about an hour while Jasper Reports was still generating the report. The Jasper Report documentation is referring to implement a custom data source based on a SAX parser to solve these kind of performance problems. But this is not completely right because what you really wanna use is a SAXPullParser implementation in order to control which is the current parsed tag rather than being called by the SAXParser itself.
So what are the steps to use a SAXPullParser as part of a custom data source:

  1. Create the XML raw data format
  2. Write a custom data source
  3. Incorporate the data source into the report generation

Sample project

The attached sample project shows a complete sample implementation of a custom data source and a SAXPullParser. The XML raw data is already created and part of the test resources.

The XML format

The XML format is very simple and represents e.g. all Persons of a company:

<?xml version="1.0" encoding="UTF-8"?>
<Persons>
    <Person firstname="First" lastname="Last" mail="first.last@test.test" title="nice guy" age="47" status="married"/>
    ...
</Persons>

The custom data source

The custom data source extends the JRAbstractTextDataSource and therefore the sample implementation has to override the methods 

  • boolean next(): Determines whether or not there is another row to display
  • Object getFieldValue(JRField field): Requests the value for the given field/cell

It simply implements the Iterator pattern in order to render all columns of the report.
Initialization
The constructor of the class expects an Inputstream that represents the XML source. Based on that stream the class initializes a XMLStreamReader which is basically an iterator over the XML tags.
boolean next()
The implementation of the next() method initially iterates over the XML tags till it reaches the first Person tag and stops at this point. Unfortunately this means that the custom implementation contains knowledge about how the XML is structured and makes it very hard to reuse.
Every subsequent call to the next() method sets the current pointer to the next Person element and returns true, until the Persons tag has been reached or the end of the document appeared.

  int eventType = xmlStreamReader.getEventType();
  String tagName = null;
  boolean isStart = false;
  while (xmlStreamReader.hasNext()) {
    eventType = xmlStreamReader.next();
    switch (eventType) {
      case XMLEvent.START_ELEMENT:
        isStart = true;
      case XMLEvent.END_ELEMENT:
        tagName = xmlStreamReader.getLocalName();
        // check if there is still a person element left
        if (isStart && PERSON_TAG_NAME.equals(tagName)) {
          return true;
        } else if (!isStart && PERSONS_TAG_NAME.equals(tagName)) {
          // end tag of persons, nothing else to handle
          return false;
        }
        break;
      case XMLEvent.END_DOCUMENT:
        return false;
      }
    isStart = false;
  }

Object getFieldValue(JRField field)
The implementation of getFieldValue(JRField field) is very simple, because the attribute name is exactly the name that is assigned in the Jasper Report document.

return xmlStreamReader.getAttributeValue(null, field.getName());

Brining all together

We have now a XML file that contains our test persons and a custom data source that iterates one by one over each person. It is time to see how this improved the report generation time and bringing all pieces together.
The ReportGenerator class accepts the XML source and the template as stream and via an additional flag it is possible to switch between the default and the custom data source. A simple Junit test is using this class to run 4 report generations and measures the amount of time it needed. Here the result of the Junit test on my local machine:
Running com.jasperreports.ReportGeneratorTest
INFO  ReportGenerator - Created report with default XML data source  in 4.136 seconds.
INFO  ReportGenerator - Created report with custom datasource  in 0.684 seconds.
INFO  ReportGenerator - Created report with default XML data source  in 21.463 seconds.
INFO  ReportGenerator - Created report with custom datasource  in 3.495 seconds.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.267 sec

The first two results are using a XML source with 100 Persons, the following two are processing 5000 Persons. The custom data source implementation is about 6 times faster than the default implementation and this sample “only” uses only a fraction of what we had to process in our project. In fact our worst case scenario mentioned in the beginning is using 50000 records on multiple tabs and therefore this improvement pays of very fast.