Json to XML, or “transform in 6 seconds.”

Hi folks. I want to share with you some details about our engine. As you know, it is written in Go. We use a lot of libraries there, and one of them – mxj – a outstanding library to work with XML.

Now I will briefly tell you how our engine’s json2xml routine works. First, we convert json to the map [string] interface {}, and then feed this object to mxj following way: xmlValue, err: = mxj.AnyXmlIndent (data, "", "", "body"). After it we fix the self-closed tags and pass the object on. We used this mechanics for 3 months, and everything would be fine, but suddenly took us to parse larger volumes of json than usual. And it turned out to be a problem. One of the diggers work 8 hours instead of 15 minutes. So we did necessary research. Page processing takes 16 minutes, which, for obvious reasons, is unacceptable. It turned out that there is 2.5 MB of json. Processing takes about 3 minutes using mxj library, and then some kind of magic happened – the engine went crazy and it took 13 minutes to process XML. Of course, we were not happy with it, and we decided to improve mxj first.

mxj library problem lay in the fact that there was used string concatenation. Everyone knows that the strings in Golang are immutable, respectively, each such operation allocates memory for the old string and a new string. We decided to get around and have written few new functions, which uses bytes.Buffer instead of strings. Only by this simple change we were able to speedup XML processing in mxj library by about 180 times. Now it takes less than 1 second to process same set of data we used before, so we made it from 3 min to 1 sec.

During further research we found were we made a mistake, our engine expects HTML and when we are working with JSON, it may come up that some self-closed HTML tags (like img or area etc) are used in XML as standard tags and it caused problems, so we made another change to the library that allowed us to replace some tags with safe versions. It solved all issues we had and page that previously took 15 min to process now takes just 6 sec.

Repository with library we modified can be found here.

As a bonus, we wrote a simple converter that allows you to load data from MongoDB and convert it toXML. You can get it here.

Leave a Reply

Your email address will not be published. Required fields are marked *