Self-hosted scraping? No problem!

Do you want to scrape some data from the website, but you are not ready to deploy you parser to the cloud and pay for page requests and bandwidth? Then perhaps more appropriate option for you is to download the program, run it on your own computer or server and get the data in the desired format.

It can be easily accomplished with new Diggernaut’s service options. You can now compile (create an executable program) Digger for Windows, Linux and Mac. This will allow you to run your diggers outside of our cloud, on your computer or server so you can save on Diggernaut’s account resources because you do not spend when you self-host your digger.

In addition, the compiled diggers take up very little space (about 20 MB), as well as consume very few computer resources (~ 10-30 MB of RAM and 1-3% of the CPU).

Service to compile will be absolutely free during the beta phase, after the release our paid subscribers will keep using compile service for free, while free users will be able to buy compilation time.

How it works? Everything is very simple. You create a digger (or use a ready-made you already have), write a configuration for it. As you probably already know, there are three ways to do it: use our Excavator app, write a configuration using our meta-language, or hire our or third-party developer to create configuration for you. After you create or receive a Digger configuration, save it to the digger. Then launch the digger in debug mode to make sure it works properly. While digger is in debug mode, resources are free, so you need not worry that you spend all resources you have. If digger is working properly and the data collected is in good state, you can proceed with compilation.

To do it, go to the digger list, find your digger and click the “Options” button in the “Compile” column

compile1)

There is a a new panel with the name Compiler will be opened below the digger list. It is divided into two parts. On the right side you will see a list of compiled diggers, where you can download any of them. Please note, that compiled diggers are stored for 7 days and then deleted.
In the left pane, you will see the screen with compilation settings

compile2

First you have to choose how digger will output the data. You can output to a file or to the console. To output data to a file, select the File option in the “Output Type” field, to output to the console – StdOut.
If you choose the output to a file, you will need to specify the file name in the “Output File Name” field.
In the Format field you must select the format you need, there are currently available 4 types: Excel, CSV, JSON and XML. Excel and CSV does not support nested data structures, so before you use them, make sure that your data is flat (the root objects has no nested objects, only fields). If you need some other format, please contact us and we will add it, if we can.
Finally, you must choose a platform, which will be compiled under the digger, in the last field “Platform”. Currently we support Windows, MacOS and Linux for x86(32bit) and x64(64bit). If you need any other platform, please contact us and if the compiler supports this platform, we will add it as soon as possible.

After you configure the compiler, click the Compile button and in a few seconds you will see a digger compiled in the table on the right side.

compile3

You can download it by clicking on “Download” link and run it at your own computer. A link will be valid for 7 days, after which the compiled digger will be removed.

What happen if compiled scraper stops to work properly, eg if website changed its structure and data parsed wrongly now? You can always get back to your Diggernaut account, launch the digger in debug mode, see what is wrong, fix it and compile revised version. Or ask one of our or third-party developers to help you.

Happy scraping!

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *