Aug 20, 2014 - Setting up SSI on nginx to work with Symfony 2.6

With the upcoming Symfony 2.6 SSI support is directly built-in. For me this is pretty exciting, because it is the first merged PR, that is more than a minor addition, or bugfix. However, setting up nginx wasn’t that flawless at the end. More about that later in this post.

server {
    server_name domain.tld www.domain.tld;
    root /var/www/project/web;

    location / {
        # try to serve file directly, fallback to app.php
        try_files $uri /app.php$is_args$args;
    }

    location ~ ^/(app|app_dev|config)\.php(/|$) {
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_split_path_info ^(.+\.php)(/.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param HTTPS off;
    }

    error_log /var/log/nginx/project_error.log;
    access_log /var/log/nginx/project_access.log;
}

The SSI-implementation behaves like the long existing ESI-implementation, which means you have to enable it on server-side too 1. As long as the server doesn’t tell the Symfony2-stack, that it is able to handle SSI-tags (or ESI-tags), it falls back to the InlineRenderer, that renders the response of the action directly into the document 2. To do this, we add a header.

In the response the application itself adds a header. The intention of this header is the exact inverse idea of the header we send to the application: It allows the server to decide, whether, or not substitution is required. This post only covers unconditional SSI: It tries to find and replace SSI-tags, even if there is none just to keep it simple. However, we still don’t want anybody outside to see this, so we will remove it.

location ~ ^/(app|app_dev|config)\.php(/|$) {
    ssi on;

    # Other options

    fastcgi_param HTTP_SURROGATE_CAPABILITY symfony2="SSI/1.0";
    fastcgi_hide_header SURROGATE_CONTROL;
}

While this was as simple as it could be, now here comes the ugliness: It doesn’t work! If there is a SSI-tag, it will end up in an infinite loop…

The problem is not directly visible in our configuration, because it comes from the default fastcgi_params. This file contains (beside others) this line:

# [..]
fastcgi_param REQUEST_URI $request_uri;
# [..]

Digging further we can find something in the nginx Manual, that describes, what this means:

$request_uri
    full original request URI (with arguments)
[..]
$uri
    current URI in request, normalized
    The value of $uri may change during request processing,
    e.g. when doing internal redirects, or when using index files.

The definition of “full original request URI” is interpreted pretty strict by nginx, because it always and in every case contains the initial URI. For nginx everything, that somehow influences the used URI is treated as an internal subrequest and to perform a subrequest nginx updates $uri and starts processing this new URI from the beginning. That includes rewrite of course, try_files and even SSI, even if it is actually something slightly different than the previous mentioned rewrites.

For our configuration above this means, that every SSI-subrequest back to the Symfony2-application will have the same and identical REQUEST_URI-parameter. Of course the result will contain the SSI-tag again. I spent a long time investigating the best solution to work around this, that both “works” and is not too complex. At the end I’ve found my solution, that at least works. The downside is, that now URIs like example.com/app.php/foo/bar (–> note the app.php/) doesn’t work anymore.

server {
    server_name domain.tld www.domain.tld;
    root /var/www/project/web;

    location / {
        set $orig_uri $uri; # <-- Remember uri during first iteration of every (sub)request
        try_files $uri /app.php$uri$is_args$args;
    }

    location ~ ^/(app|app_dev|config)\.php(/|$) {
        ssi on;

        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_split_path_info ^(.+\.php)(/.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_param HTTPS off;

        fastcgi_param REQUEST_URI $orig_uri$is_args$args; # <-- Use $orig_uri instead

        fastcgi_param HTTP_SURROGATE_CAPABILITY symfony2="SSI/1.0";
        fastcgi_hide_header SURROGATE_CONTROL;
    }

    error_log /var/log/nginx/project_error.log;
    access_log /var/log/nginx/project_access.log;
}

I had something in mind with a named location, but I hadn’t followed it any further for now. I’ll probably come back to this later.

  1. Read more about Edge Architecture Specification, primary initiated by Akamai Technologies

  2. Although SSI itself doesn’t mention to use headers to control it’s behaviour I’ve decided to follow the ESI-specifications for the Symfony2-implementation too to avoid confusion. 

May 6, 2014 - How to transform a project from a huge subversion repository to git

Disclaimer: I’ve written this post a while ago and I am not entirely sure how accurate it was the time I stopped proof-reading it. So take with care and always keep a backup. The information provided here might be outdated, or even wrong. I hope, it still helps someone getting rid of overly large SVN-repositories.

Multi-project subversion repositories sound convenient at the first glance: Everything at one place and only one system to manage. If you ever consider to switch over to git it is a bad idea. Usually transforming a subversion into a git repository is quite easy, but that is only valid for repositories of a reasonable size. Multi-project repository tend to grow quite big. The usual transformation via git svn clone <svn-url> extremely slows down on huge repositories. I once had to convert some projects from a SVN-repository with at the end 141 more or less active projects and around 470000 commits…

A short overview of what is required

  • Time. It doesn’t take weeks anymore, but it still requires some time (I got it working in around 4 hours at the end, but after I had created a local repository)
  • A fast storage. The whole process is extremely IO-intensive. A SSD works fine, even better — if there is enough RAM available — is a RAM-disk of around 4 to 8 GB.
  • svnrdump. This is part of SVN 1.7.
  • I used svn-all-fast-export for the actual conversion (it’s available in the ubuntu repository). svn-fast-export should work too, but I had issues with that and stopped investigating.
  • BFG Repo-Cleaner to get rid of large files and files containing sensitive data.
  • Of course Subversion (especially svnadmin) and git.

Step 1: Create a local copy

If you already have access via filesystem to the repository you can skip this step. This is to create a local copy of a remote repository, so that we don’t have to perform every operation over the network.

This example downloads 10000-commit chunks in parallel up to commit 400000. Remember to update the values, so they match your repository. Downloading in chunks bypass the limitations of the HTTP a little bit.

# First one without "--incremental"
svnrdump dump \
    --revision 0:10000 \
    http://example.com/path/to/svn/MyProject | gzip -9 > MyProject.00.svn.gz
for i in {01..39}; do
    svnrdump dump \
        --incremental \
        --revision $(($i))0001:$(($i+1))0000 \
        http://example.com/path/to/svn/MyProject | gzip -9 > MyProject.$i.svn.gz
done;

# Create local repository and import the chunks
mkdir MyProject.svn
sudo mount -t tmpfs none MyProject.svn # Skip this, if you don't want to use a ramdisk,
                                       # but you have to deal with the consequences yourself
svnadmin create MyProject.svn
for i in {00..39}; do
    gunzip < MyProject.$i.svn.gz | svnadmin load --quiet --force-uuid MyProject.svn;
done;

svnadmin dump --quiet MyProject.svn | gzip -9 > MyProject.svn/MyProject.svn.gz

mv MyProject.svn/MyProject.svn.gz /path/to/backup/

The import of the full-backup into a ramdisk-SVN-repository is quite fast (took me around 5min), so it’s fine to just keep this and rebuild the repo a new after a restart.

Step 2: Prepare export

SVN tracks committers only by an username, but git by an email-address with an additional, optional and arbirtrary (display-)name. Create a script, paste the following code into it and make it executable. Note, that you must change the path to your local SVN-repository.

#!/usr/bin/env bash
authors=$(svn log -q file:///absolute/path/to/MyProject.svn | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = ${author} <${author}@example.com>";
done
./extract-authors.sh > authors.txt

Fix the content. git-all-fast-export (and as far as I know all the other tools too), expect a format svn-user-name = committer name <[email protected]>. It’s really easy. If you don’t know each and every name, it doesn’t matter.

Step 3: Export

The ugliest part is creating the “rules”-file. svn-all-fast-export expects a rules file, that contains rules on how to map paths in SVN to git branches and tags. For further options, see Gitorius samples

create repository MyProject
end repository

match /MyProject/trunk/
  repository MyProject
  branch master
end match

match /MyProject/branches/([^/]+)/
  repository MyProject
  branch \1
end match

match /MyProject/tags/([^/]+)/
  repository MyProject
  branch tag/\1
end match

match /MyProject/tags/([^/]+)/
  repository MyProject
  branch refs/tags/\1
end match

match /
  # ignore everything we don't know (remove/comment this to find missing mappings)
end match

As you can see every mapping is prefixed with “MyProject”, because this tool isn’t able to strip the project path itself.

svn-all-fast-export --identity-map=authors.txt --rules=my-rules.rules /path/to/local/svn

If everything worked fine, you know should have the bare git repository in a subfolder named MyProject. Theoretically you are finished now.

Step 4: Cleanup

First we simply drop all already merged branches. If they are already merged, you don’t need them anymore and you can re-create the branch, when you need it again.

git branch -d `git branch --merged`

[BFG here]

git reflog expire --expire=now
git gc --aggressive --prune=now

One thing, that is a little bit annoying is, that for whatever reason svn-all-fast-export exports the svn:ignore property into an .svnignore file. With some git “black magic” you can rename the file in the whole repository. I recommend it as the last step, because it is by far the most time consuming step when handling with the git repository, but it is at least faster, when the repository is already smaller.

git filter-branch --index-filter 'git ls-files -s \
    | sed "s-\(\t\"*\).svnignore-\1.gitignore-" \
    | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

However, this isn’t completely sufficient. svn-all-fast-export doesn’t convert svn:ignore-properties in subfolders. A good start to fix this manually (at least in the master-branch) is svn propget -R and compare it with the .gitignore. Also it is a good idea to review, if it really covers everything, that should get ignored. I wouldn’t spend too much time into this, because it only affects developing and it only affects “living” branches.

Jan 11, 2014 - Strings are constants too

In our development team, we have a (more or less strict) rule: If it’s a constant value, make a constant out of it. In many cases this makes sense, or at least increase clarity, or readability.

class Constant {
    const HOUR = 3600;
    const DEFAULT_TIMEOUT = 120;
}

(Aside: We haven’t left the “classes for everything”-paradigm yet)

As you can see both values have at least a small semantic value and either increase readability (HOUR), or may change over time. But sometimes during code-reviews I see comments like (simplified example)

date('H');
// Constant string: Make a constant out of it

Lets assume I make a (class-)constant out of it. May it change some time in the future? Quite sure no. Can it make anything clearer? If you are aware of the manual (hopefully you are) probably not. Does it increase readability? In this case it can make things even worse (maybe not that obvious at first glance)

use Foo\Bar\Constant;
date(Constant::HOUR);

So why is this worse? It is longer, what isn’t bad on it’s own, but just unnecessary, it has a reference to a (otherwise) unrelated class, what is also acceptable, and it decreases clarity, because it’s name doesn’t point out, whether, or not it makes use of leading zeros and if it is in 12-hour- or 24-hour-format. The solution would be something like

class Constant {
    const HOUR_12H_WITHOUT_LEADING_ZEROS = 'g';
    const HOUR_12H_WITH_LEADING_ZEROS = 'h';
    const HOUR_24H_WITHOUT_LEADING_ZEROS = 'G';
    const HOUR_24H_WITH_LEADING_ZEROS = 'H';
}

Now remember the other date-related formatting characters. Or think of combining them… Doesn’t sound fun anymore, does it? What about “weekday”? Does that tell you, if it’s a numeric value, the name, or the shortened name? That sounds like it will end up in a huge bunch of constants with unnecessary long names for something you can read in the official, public available manual.

Whenever I read “That could be a constant” I usually think “Well, a constant value is a constant too”. Sometimes there is simply no good reason to substitute constant values with a constant. Using 3600 as as “timestamp”-ish parameter should be clear to every developer. If you are concerned, that somebody can misunderstand that, try 60*60 instead. If you have a constant DEFAULT_TIMEOUT you maybe use it once, because for other connections you may use different defaults. Is it worth it, to create this separation between the value and the use of it? If you want to change it, you’ll probably not look at the constant first, but at the call, where it is used, which is always an indirection. Now you want to use a different timeout for this single connection, so you’ll remove the reference to the constant anyway and maybe even add a new constant DEFAULT_TIMEOUT_XY_CONNECTION.

Think, before you scream for constants. Not every constant is helpful. Some are at best superfluous. But having all this indirections – because it wont end with a single superfluous constant – can get really distracting.