Category Archives: nginx

How to implement an HTTP output filter in nginx

Internally nginx HTTP server has a stack of output filters. These filters affect the reply generated by other sources, such as phase handlers or upstream. There are 2 types of output filter: header filters and body filters. Obviously, header filters apply to the header of the reply and body filters apply to the body of the reply.

Header filters

Header filters adapt the reply by manipulating the status and header lines of the reply. Nginx has a structure called ngx_http_headers_out_t that stores data to be sent out in the reply header. This structure appears as headers_out in the HTTP request structure ngx_http_request_t.

Let’s take a look at the most important internals of this structure (from version 1.17.6):

typedef struct {
    ngx_list_t                        headers;
    ngx_list_t                        trailers;

    ngx_uint_t                        status;
    ngx_str_t                         status_line;

    [...]
} ngx_http_headers_out_t;

Here we see headers — which is a list of header lines to be sent, trailers — a list of trailer lines to be sent, status — is the status code of the request and status_line — is the status line of the upcoming reply. The rest of the members of this structure are omitted because they serve only specific purpose in the codebase.

Any time before the reply header is sent you can extend the headers, the trailers or change the status of the reply. This is pretty sufficient for majority of functions that a header filter can perform.

A header filter itself is a function with the following signature:

ngx_int_t header_filter(ngx_http_request_t *r);

That is, it takes a request as the only argument and returns a status code. The only status code that this function can return by itself is NGX_ERROR indicating that the an error has occurred and the header filter thinks processing of the reply cannot go on anymore. In any other case a header filter must call the next header filter in the chain.

How is the filter chain formed? Nginx codebase has a global variable ngx_http_top_header_filter. This one always points to the top header filter in the stack. Upon startup Nginx calls initialization functions for each module. Each module can capture the pointer to the top filter in the stack and replace it by an own filter. Typically the top filter is saved into a variable called ngx_http_next_header_filter by every module that wants to install a header filter:

static ngx_http_output_header_filter_pt ngx_http_next_header_filter;

static ngx_int_t my_module_init(ngx_conf_t *cf)
{
    ngx_http_next_header_filter = ngx_http_top_header_filter;
    ngx_http_top_header_filter = my_header_filter;

    return NGX_OK;
}

This variable is declared static. This way every header filter can have an own instance of this variable. By calling ngx_http_next_header_filter a header filter passes execution to the next header filter in chain.

At the bottom of the stack there is function that takes everything from headers_out, transforms this data into a sequence of bytes and calls so-called write filter that writes everything to the socket. Thus, the reply header gets sent out.

Thus, if a header filter does not pass execution to the next filter in the chain, the HTTP reply header will never gets written. This is an error.

So, what is the typical flow of a header filter? From the above it appears very simple: step 1 — check if any modifications of the reply is needed, step 2 — modify the reply, step 3 — pass execution to the next filter:

static ngx_int_t my_header_filter(ngx_http_request_t *r)
{
   if([ check if any adaptation of the reply is needed ])
   {
      [ modify the reply by adding/changing headers lines or changing status ]      

      h = ngx_list_push(&r->headers_out.headers);
      if(h == NULL) {
          return NGX_ERROR;
      }

      h->hash = 1;
      ngx_str_set(&h->key, "X-Header");
      ngx_str_set(&h->value, "x-value");
   }

   // Call the next header filter in the chain
   return ngx_http_next_header_filter(r);
}

When a header filter decides that no adaptation of the reply is needed, the execution is passed straight to the next filter and the reply remains intact.

Getting cookie value in nginx

How to get the cookie valuenginx allows you to extremely easily extract the value of a cookie. Simply use $cookie_<name> meta variable in whatever context you need.

Here is an example:

location / {
     proxy_set_header X-Session-id $cookie_sid;
     proxy_pass http://upstream;
}

In this configuration snippet we pass the request to the upstream named “upstream” and extend it with a header “X-Session-id” set to the value if the cookie named “sid”. Being a meta variable, $cookie_<name> can be used to control redirects, conditional configuration sections and upstream selection. Continue reading

Nginx Essentials is available for pre-order

B04282_MockupCover_Normal.jpgNginx Essentials is available for pre-order! This is my first book ever and my first book about Nginx. The book is conceived as “programmer’s view of Nginx administration” and intends to enrich web masters’ and site reliability engineers’ knowledge with subtleties known to those who have deep understanding of Nginx core.

At the same time, Nginx Essentials is a ‘from the start guide’ allowing newcomers to easily switch to Nginx under experienced guidance.

I’ve enjoyed writing this book and it’s been a great re-solidification of my knowledge and I hope you’ll enjoy reading it as much as I do enjoy writing it!

Logging modules re-released!

Few months ago I shut down the pages of nginx socketlog module and nginx redislog module. This is due to excessive volume of support they required. Some people, however, found that these are interesting pieces of technology and I got several requests to release them.

Today I release these modules under free licence. Meet the released nginx socketlog module and nginx redislog module!

Counting unique visitors

In version 1.0.2 of redislog module I added a feature that allows you to do conditional logging. What can you do with it? For example, logging only unique visitors. E.g.:

userid         on;
userid_name    uid;
userid_expires 365d;

access_redislog test "$server_name$redislog_yyyymmdd" ifnot=$uid_got;

$uid_got becomes empty whenever a visitor doesn’t have an UID cookie. Therefore, this directive effectively logs all hits of unique visitors. You can populate a list (one per virtual host and day or hour) with unique visitor records and count them with LLEN. For that just use LPUSH command instead of default APPEND. Details could be found in the manual.

Better logging for nginx

Somehow the problem of logging was not completely addressed in nginx. I managed to implement 2 solutions for remote logging: nginx socketlog module and nginx redislog module. Each of these modules maintain one permanent connection per worker to logging peer (BSD syslog server or redis database server). Messages are buffered in 200k buffer even when logging peer is offline and pushed to logging peer as soon as possible.

If logging connection interrupts, these modules try to reestablish it periodically and if successful, buffered messages get flushed to remote. That is, if logging peer closes connection gracefully, you can restart it without restarting nginx.

In addition to that, redis logging module is able to generate destination key names dynamically, so you can do some interesting tricks with it, e.g. having one log file per server or per IP per day.

Take a look at the manual in order to get the idea of how it works.

Configuration directives

In one of the previous articles I discussed the basics of HTTP modules. As the power of Nginx comes from configuration files, you definitely need to know how to configure your module and make it ready for variety of environments that users of your module can have. Here is how you can define configuration directives. Continue reading

Measuring time spent on page

One of the challenges of A/B testing is insufficient observations due to low traffic. In other words, if you measured the conversion rate on our web site, it would take months or even years before we’d get conclusive result. What you can try to measure are microconversion and microobservations. That’s what I was up to recently. There are couple of microobservation types I identified so far: time spent and the depth. The time spent is basically how much time a visitor has spent on the web site in seconds and the depth is how many clicks he made after seeing the landing page. As you might notice, you always have some time spent and depth measurements, unless the visitor is a bot.

The other way you can enlarge your data set is by using visits instead of visitors. In case of time spent and depth metrics it makes much more sense.

I used standard Nginx userid module in order to identify visitors. When a visitor requests a page, a special action in C++ application is requested through a subrequest using ssi module. This actions registers the UID and the experiment in memory table and assigns a variant (A or B). Then it returns the variant in response and it gets stored in an Nginx variable. After that I use the value of this variable to display proper variant of the page. Continue reading

An HTTP module basics and configuration

In the previous article I explained how modules of all types link into Nginx. Now let’s  look closer at the specifics of HTTP modules.

An HTTP module has the value NGX_HTTP_MODULE in its type field and the ctx field points to a global instance of a structure ngx_http_module_t: Continue reading

How your module plugs into Nginx

In previous articles I have deliberately omitted almost everything related to the question of linking your module with Nginx. It is important, however, that you know about it.

Let’s take a closer look at the metainformation that your module must contain, so that Nginx can initialise and configure it. Continue reading