Auto-Scaling Celery Workers with Rackspace Cloud Servers
I have been working on an automated stock trading system for some time. Part of my automated trading system involves a lot of number crunching and calculations, such as for technical analysis and neural networks. Processing large amounts of data for thousands of stock tickers can be very time consuming and take a significantly long time to fully process the data. I have begun creating additional worker nodes so that I have more processing power available in order to complete the data processing and number crunching much faster.
I have created some tools to help me manage these additional worker nodes and simplify the scaling up and scaling down of the data processing workers. My tools will automatically scale up worker nodes using the Rackspace Cloud Servers API and then start my python celery workers to crunch the data and process the tasks in my rabbitmq queue. When the processing has been completed and the tasks in queue are zero, the auto-scaling script will spin down and destroy the worker cloud servers.
The auto scaling tool checks my rabbitmq queue size and if there are a large amount of tasks in the queue, it will create a number of cloud servers using the Rackspace Cloud API and an image template I have created and saved at Rackspace Cloud.
Basically it works like this:
Each worker instance is built from a template image, so it has the exact same packages and code base.
If the queue size is very large, then create a bunch of workers.
If the queue size is 0, then delete the additional workers, to save money by removing instances we’re not actively using.
I’m using Fabric for python to create the celery worker servers. The celery workers then work on tasks in a RabbitMQ messaging queue. Fabric is a very powerful tool that can run commands automatically on the newly created servers. I am also using the python-cloudservers python package to interface with Rackspace Cloud API and create the servers. After the servers are created, I’m using rsync to copy over my code base to the newly created cloud servers. Fabric then starts up the celery worker daemon on the newly created worker nodes. The celery daemon on the worker then takes care of the rest and starts processing the tasks.
The script I’m using to auto-scale up and and down is a custom script I have written. It runs the auto-scale.py via a crontab entry on my primary/master processing server.
In Nginx, you can easily set browser caching for your images. Nginx sets the ‘Expires’ and ‘Cache-Control’ http request headers for images nginx serves. This allows the client’s browser to cache the images for the amount of time specified by the expires tag inside the location block of code.
Here’s the location block I’m using in my Nginx virtual host configuration file:
I was looking for an easy way to clone my perl modules from one server to another but I wasn’t sure how to easily list my installed perl modules. I searched around and found the instmodsh command which lists all currently installed perl modules.
I found Instmodsh is easy to use. In your terminal type ‘insmodsh’ to use it and this will bring you to an interactive shell tool.
Here is an example listing installed perl modules with the ‘instmodsh’ command:
Available commands are:
l - List all installed modules
m - Select a module
q - Quit the program
To list all perl modules type ‘l’ in the instmodsh shell: