Create a Static Website at Amazon AWS using Python, S3, and Route53

Benefits of Static Web Sites

The most important benefits of static web sites are speed and security.

Fully static web sites can be delivered via content delivery networks (CDN), making them load much faster to the end user. CDN benefits include caching for your site objects and edge locations (servers) that are geographically closer to your end users.

Static web sites are also generally more secure than dynamic sites. This is because there are significantly fewer attack vectors for flat html pages than with using an application server. Popular content management systems such as WordPress and Drupal have had exploits affecting millions of web sites. And new exploits for popular application servers and CMS software are routinely discovered.

In one example, a critical vulnerability in Drupal was announced impacting 12 million websites, and any web site not patched within 7 hours was considered compromised.

A critical vulnerability in WordPress could be considered even more serious as WordPress powers an estimated 25% of all websites globally.

Static vs Dynamic

A static web site is a website made up of “flat” or “stationary” files, that are delivered to the end user exactly as stored. Most commonly, static websites are a collection of plain .html files.

Dynamic web sites, on the other hand, are generated for the user on the fly by an application server. An example of a dynamic web site would be any WordPress site.

Setting up a Static Web Site at Amazon AWS with Python

To set up a static web site at AWS, we’ll use 2 of their services: S3 and Route53. S3 is an object storage service and this is where we’ll store the files that comprise our site. CloudFront is the AWS content delivery network (CDN) service, that has edge locations distributed throughout the world, to ensure your end users are able to load your site as fast as possible. Route53 is the domain name system (DNS) which lets you host your domain name with AWS.

I’ll be using Python to demonstrate creating the static site using AWS services. AWS provides a Getting Started: Static Website Hosting tutorial if you want to manually perform these steps.

Prerequisite: Install AWS Python SDK

The examples use the AWS python SDK to build the static site, so you’ll want to install it.

For most people, this will typically be:

pip install boto3 awscli

Once installed, we will create an AWS configuration file with credentials and default settings such as preferred region:

aws configure

Step 1: Create S3 Bucket for a static web site

Our new static web site will be stored in AWS S3 so we’ll need to create a new bucket for the website’s files.
Creating a S3 bucket with python is simple:


# Load aws boto3 module
import boto3

# Specify the region to create the AWS resources in
DEFAULT_REGION = "us-east-1"

# Create S3 resource
s3 = boto3.resource('s3')

# Set a bucket name which will be our domain name.
bucket_name = "demo123456.com"

# Create a new S3 bucket, using a demo bucket name
s3.create_bucket(Bucket=bucket_name)

# We need to set an S3 policy for our bucket to
# allow anyone read access to our bucket and files.
# If we do not set this policy, people will not be
# able to view our S3 static web site.
bucket_policy = s3.BucketPolicy(bucket_name)
policy_payload = {
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "Allow Public Access to All Objects",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::%s/*" % (domain)
  }
  ]
}

# Add the policy to the bucket
response = bucket_policy.put(Policy=json.dumps(policy_payload))

# Next we'll set a basic configuration for the static
# website.
website_payload = {
    'ErrorDocument': {
        'Key': 'error.html'
    },
    'IndexDocument': {
        'Suffix': 'index.html'
    }
}

# Make our new S3 bucket a static website
bucket_website = s3.BucketWebsite(bucket_name)

# And configure the static website with our desired index.html
# and error.html configuration.
bucket_website.put(WebsiteConfiguration=website_payload)

Step 1.1: Create S3 Bucket for redirecting www.domain.com to root domain.com

I like to redirect “www” to the root domain, such that www.domain.com will redirect to domain.com for the user. For this to work in AWS, we’ll need to create a second bucket for the www hostname, and set the bucket to redirect.


# Load aws boto3 module
import boto3

# Specify the region to create the AWS resources in
DEFAULT_REGION = "us-east-1"

# Create S3 resource
s3 = boto3.resource('s3')

# Create a new S3 bucket, using the www demo bucket name
bucket_name = "demo123456.com"
redirect_bucket_name = "www.demo123456.com"

s3.create_bucket(Bucket=redirect_bucket_name)

# The S3 settings to redirect to the root domain,
# in this case the bucket_name variable from above.
redirect_payload = {
        'RedirectAllRequestsTo': {
            'HostName': '%s' % (bucket_name),
            'Protocol': 'http'
        }
}

# Make our redirect bucket a S3 website
bucket_website_redirect = s3.BucketWebsite(redirect_bucket_name)

# Set the new bucket to redirect to our root domain
# with the redirect payload above.
bucket_website_redirect.put(WebsiteConfiguration=redirect_payload)

Step 2: Create a Route53 Hosted zone for the domain

Now that we have created an S3 bucket and web site for our new domain, we need to add the new domain to Amazon AWS DNS service, called Route53.
In Route53, we will create a new hosted zone for our domain name and add DNS records for the root domain.com and the redirect www.domain.com to point to our corresponding S3 buckets.


# Load the AWS boto3 module
import boto3
# We'll want to generate a unique UUID later
import uuid

# Specify the region to create the AWS resources in
DEFAULT_REGION = "us-east-1"

# A mapping of hosted zone IDs to AWS regions.
# Apparently this data is not accessible via API
# http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
# https://forums.aws.amazon.com/thread.jspa?threadID=116724
S3_HOSTED_ZONE_IDS = {
    'us-east-1': 'Z3AQBSTGFYJSTF',
    'us-west-1': 'Z2F56UZL2M1ACD',
    'us-west-2': 'Z3BJ6K6RIION7M',
    'ap-south-1': 'Z11RGJOFQNVJUP',
    'ap-northeast-1': 'Z2M4EHUR26P7ZW',
    'ap-northeast-2': 'Z3W03O7B5YMIYP',
    'ap-southeast-1': 'Z3O0J2DXBE1FTB',
    'ap-southeast-2': 'Z1WCIGYICN2BYD',
    'eu-central-1': 'Z21DNDUVLTQW6Q',
    'eu-west-1': 'Z1BKCTXD74EZPE',
    'sa-east-1': 'Z7KQH4QJS55SO',
    'us-gov-west-1': 'Z31GFT0UA1I2HV',
}

# Load Route53 module
route53 = boto3.client('route53')

# Define the domain name we want to add in Route53
domain = "demo123456.com"
www_redirect = "www.demo123456.com"

# We need to create a unique string to identify the request.
# A UUID4 string is an easy to use unique identifier.
caller_reference_uuid = "%s" % (uuid.uuid4())

# Create the new hosted zone in Route53
response = route53.create_hosted_zone(
    Name=domain,
    CallerReference=caller_reference_uuid,
    HostedZoneConfig={'Comment': domain, 'PrivateZone': False})

# Get the newly created hosted zone id, used for
# adding our DNS records pointing to our S3 buckets
hosted_zone_id = response['HostedZone']['Id']

# Add DNS records for domain.com and www.domain.com
website_dns_name = "s3-website-%s.amazonaws.com" % (DEFAULT_REGION)
redirect_dns_name = "s3-website-%s.amazonaws.com" % (DEFAULT_REGION)

# Here is the payload we will send to Route53
# We are creating two DNS records:
# one for domain.com to point to our S3 bucket,
# and a second for www.domain.com to point to our
# S3 redirect bucket, to redirect to domain.com
change_batch_payload = {
    'Changes': [
        {
            'Action': 'UPSERT',
            'ResourceRecordSet': {
                'Name': domain,
                'Type': 'A',
                'AliasTarget': {
                    'HostedZoneId': S3_HOSTED_ZONE_IDS[DEFAULT_REGION],
                    'DNSName': website_dns_name,
                    'EvaluateTargetHealth': False
                }
            }
        },
        {
            'Action': 'UPSERT',
            'ResourceRecordSet': {
                'Name': www_redirect,
                'Type': 'A',
                'AliasTarget': {
                    'HostedZoneId': S3_HOSTED_ZONE_IDS[DEFAULT_REGION],
                    'DNSName': redirect_dns_name,
                    'EvaluateTargetHealth': False
                }
            }
        }
    ]
}

# Create the DNS records payload in Route53
response = route53.change_resource_record_sets(
    HostedZoneId=hosted_zone_id, ChangeBatch=change_batch_payload)


Add Content to S3 Bucket

After creating our S3 buckets and added our domain name to Route53, we have a few remaining tasks in order to make our new static web site live.
First, we need to add an html page to our S3 bucket for our visitors to see.
We can do this with python or we can use a static site generator site as Jekyll to build our static site.
Here’s an example using python:


# Load the AWS boto3 module
import boto3

# Set our domain name and bucket name
# I use the domain as the bucket name,
# such that they are the same
domain = "demo123456.com"

s3 = boto3.resource('s3')

# Very simple, basic HTML code for our landing page
payload = ("<html><head><title>%s</title></head>"
           "<body><h1>%s</h1></body></html>"
           % (domain, domain))

# Create the index.html page in S3
s3.Object(domain, 'index.html').put(Body=payload, ContentType='text/html')

Change nameservers at your domain registrar to point to AWS Route53

If we are using Route53 for our DNS services, we will need to update our nameservers at our domain name registrar to use Route53.
This step will vary from registrar to registrar and will most likely be a manual process because most registrars do not offer API access.
On the AWS Route53 side, you’ll need to get your name servers for your new hosted zone, then you’ll go to your registrar, such as Namecheap, GoDaddy, etc. and update your name server records there. Your registrar will have documentation on how to perform the necessary updates in their dashboards.

Pelican, a static site generator written in Python

What is Pelican?

Pelican is a static site generator, written in Python. Pelican is open source and you can find Pelican on GitHub.

Pelican also supports themes and plugins. You can write your own themes and plugins, or you can download many different themes and plugins already made and ready to go.

Pelican is similar to Jekyll in that both are static site generators and easy to use. The big difference is Pelican is written in Python while Jekyll is written in Ruby. If you prefer Python syntax, like I do, Pelican may be perfect for you.

Pelican Features

Pelican has many different features and tools to help you generate your static web site. Here is a list of the most popular Pelican features:

  • Write your content directly with your editor of choice, such as vim or Sublime Text, in reStructuredText, Markdown, or AsciiDoc formats.
  • Includes a simple CLI tool to run a development/testing web server and (re)generate your site.
  • Easy to interface with version control systems and web hooks such as GitHub.
  • Completely static output is easy to host anywhere. I use Rackspace Cloud Files CDN.
  • Built in support for Articles (such as blog posts) and Pages (such as “About”, “Projects” and “Contact” pages).
  • Theming support using Jinja2 templates.
  • Code syntax highlighting.
  • Atom and RSS feeds
  • PDF generation of the articles/pages (optional).
  • Comments, via an external service such as Disqus.
  • Publication of articles in multiple languages.
  • Import your existing site from WordPress, Dotclear, or RSS feeds.
  • Integration with external tools: Twitter, Google Analytics, etc. (optional).

Read More about Pelican

For more information about Pelican, take a look at the Pelican Blog, the Pelican code on GitHub, and the Pelican Documentation.

Jekyll with Clean URLs Hosted at Rackspace Cloud Files

I’ve been using Jekyll to generate static web sites and then hosting them on the Rackspace Cloud Files CDN which uses Akamai’s content delivery network (CDN).

With Rackspace Cloud Files I have a CDN-enabled container and I have enabled my container to serve static web site files. This means I can use Rackspace Cloud Files with Akamai CDN to serve all my static web sites and I do not need to run or manage my own servers for web hosting. I simply use Cloud Files to store and serve my site. Some bonuses to using Cloud Files with CDN are my site is served to visitors very fast and my site can easily handle a very large number of visitors. Basically, my static site can handle web scale traffic.

What are Clean URLs?

I’m an advocate of using clean URLs, or human-readable URLs, in my sites. Clean URLs have many benefits:

  • Search engine optimization
  • Improved usability
  • Improved accessibility
  • Simplifies URLs
  • Easier to remember URLs
  • Do not contain implementation details of your site (Example: no php / html / asp / etc extensions on the URL)

Here’s an example of an un-clean URL:

http://www.domain.com/category/post-name-here.html

And here’s an example of a clean URL:

http://www.domain.com/category/post-name-here

Notice there is no .html and the URL looks better. Cleaner.

What is Jekyll?

Jekyll is a simple, blog aware, static site generator written in Ruby. It lets you create text-based posts and pages and a default layout that will be used across all of your posts. So you can easily change the look and feel of your site by modifying your default template and then re-generate your site, and the changes will be applied to all of your blog posts. Jekyll also generates static files that you can use on your CDN or host them yourself on your own server.

Jekyll does not create clean URLs by default, however. It will append .html to the file name and reference URLs with the .html suffix. Not ideal for a clean URL.

How To Get Clean URLs with Jekyll

I’m using a jekyll plugin which rewrites the file name and URL reference so that the html suffix is not included. It turns your blog-post.html file name in to “blog-post” without the .html extension.

To use Clean URLs with jekyll, you’ll need to set your permalink format in your jekyll _config.yml and use a jekyll plugin to generate your web site files without the .html extension.

Here is the _config.yml permalink structure I use for my site:

permalink: /:categories/:title

This will create a friendly URL in the form of: http://www.domain.com/articles/my-awesome-article

If you don’t want to display the category in the URL, you can change the permalink to:

permalink: /:title

And this will create a URL in the format of: http://www.domain.com/my-awesome-article

Check out the jekyll plugin I’m using on my github here: jekyll-rackspace-cloudfiles-clean-urls

Rackspace Cloud Files with Jekyll and Clean URLs

I came across another problem: Rackspace Cloud Files does not know what type of file “blog-post” is as there is no file extension on it. When you browse to my CDN-hosted site to a clean URL, your browser would try to download the file instead of rendering it as html. The reason is that Cloud Files can’t peer inside the file and see that it’s all HTML code and apply the correct content type. I needed to manually set the content type myself and tell Rackspace Cloud Files that “blog-post” is type “text/html” so that a web browser can properly display it.

In order to solve this problem I have written a python helper script to apply the “text/html” content type automatically for my jekyll generated sites. My python helper script will upload my site to Rackspace Cloud Files for me and check the files it has uploaded to see if they are HTML files or not. If an HTML file is found, the python helper script will tell Cloud Files it is type “text/html”, allowing Cloud Files to properly display the html to a browser.

Download my Cloud Files / jekyll helper script from my github: jekyll-rackspace-cloudfiles-clean-urls

Experimenting with Sedo Parking

What Is Sedo Parking?

SedoParking.com allows you to park your unused domain names and earn money. Sedo also allows you to sell your domain names with the asking price of your choice.

Each visitor to your domain name hosted at Sedo Parking will see a “parking” page full of advertisements and a search box. You can select the keywords you want your domain associated with (eg. Trading, Business, etc.), as well as 3 categories your domain name or web site should be listed under. Continue reading Experimenting with Sedo Parking

Search Engine Optimization (SEO)

What is Search Engine Optimization (SEO)?

Search Engine Optimization is the art of getting high placement and rankings on search engines. Higher placement means your website will come up higher in the results pages. More web traffic will be driven to a site with high rankings. More traffic means more money. Continue reading Search Engine Optimization (SEO)