Posted on

Nginx Essentials is available for pre-order

B04282_MockupCover_Normal.jpgNginx Essentials is available for pre-order! This is my first book ever and my first book about Nginx. The book is conceived as “programmer’s view of Nginx administration” and intends to enrich web masters’ and site reliability engineers’ knowledge with subtleties known to those who have deep understanding of Nginx core.

At the same time, Nginx Essentials is a ‘from the start guide’ allowing newcomers to easily switch to Nginx under experienced guidance.

I’ve enjoyed writing this book and it’s been a great re-solidification of my knowledge and I hope you’ll enjoy reading it as much as I do enjoy writing it!

Posted on and Updated on

Logging modules re-released!

Few months ago I shut down the pages of nginx socketlog module and nginx redislog module. This is due to excessive volume of support they required. Some people, however, found that these are interesting pieces of technology and I got several requests to release them.

Today I release these modules under free licence. Meet the released nginx socketlog module and nginx redislog module!

Posted on and Updated on

Counting unique visitors

In version 1.0.2 of redislog module I added a feature that allows you to do conditional logging. What can you do with it? For example, logging only unique visitors. E.g.:

userid         on;
userid_name    uid;
userid_expires 365d;

access_redislog test "$server_name$redislog_yyyymmdd" ifnot=$uid_got;

$uid_got becomes empty whenever a visitor doesn’t have an UID cookie. Therefore, this directive effectively logs all hits of unique visitors. You can populate a list (one per virtual host and day or hour) with unique visitor records and count them with LLEN. For that just use LPUSH command instead of default APPEND. Details could be found in the manual.

Posted on and Updated on

Better logging for nginx

Somehow the problem of logging was not completely addressed in nginx. I managed to implement 2 solutions for remote logging: nginx socketlog module and nginx redislog module. Each of these modules maintain one permanent connection per worker to logging peer (BSD syslog server or redis database server). Messages are buffered in 200k buffer even when logging peer is offline and pushed to logging peer as soon as possible.

If logging connection interrupts, these modules try to reestablish it periodically and if successful, buffered messages get flushed to remote. That is, if logging peer closes connection gracefully, you can restart it without restarting nginx.

In addition to that, redis logging module is able to generate destination key names dynamically, so you can do some interesting tricks with it, e.g. having one log file per server or per IP per day.

Take a look at the manual in order to get the idea of how it works.

Posted on and Updated on

Configuration directives

In one of the previous articles I discussed the basics of HTTP modules. As the power of Nginx comes from configuration files, you definitely need to know how to configure your module and make it ready for variety of environments that users of your module can have. Here is how you can define configuration directives. Continue reading “Configuration directives”

Posted on and Updated on

Measuring time spent on page

One of the challenges of A/B testing is insufficient observations due to low traffic. In other words, if you measured the conversion rate on our web site, it would take months or even years before we’d get conclusive result. What you can try to measure are microconversion and microobservations. That’s what I was up to recently. There are couple of microobservation types I identified so far: time spent and the depth. The time spent is basically how much time a visitor has spent on the web site in seconds and the depth is how many clicks he made after seeing the landing page. As you might notice, you always have some time spent and depth measurements, unless the visitor is a bot.

The other way you can enlarge your data set is by using visits instead of visitors. In case of time spent and depth metrics it makes much more sense.

I used standard Nginx userid module in order to identify visitors. When a visitor requests a page, a special action in C++ application is requested through a subrequest using ssi module. This actions registers the UID and the experiment in memory table and assigns a variant (A or B). Then it returns the variant in response and it gets stored in an Nginx variable. After that I use the value of this variable to display proper variant of the page.

In order to track time I use a Java script that sends periodic updates to the server. Nginx sends these requests to the C++ application via FastCGI and the application updates the timestamps in memory tables. The depth tracker works in same way, but the tracking action gets invoked only when the page is loaded. Although periodic updates might produce intensive load on the server even for medium sites, as you might already know for Nginx it’s a piece of cake.

A separate thread in the C++ application saves the content of memory tables to a file periodically, and that’s how the observations get stored permanently.

Of course this application requires Java script working on client’s browser, but who doesn’t have it nowadays? A positive side effect is that you get bots automatically filtered out.

One of the interesting questions is what statistical distribution do the time spent and the depth have? My hypothesis was that they have exponential distribution. For me it is still not completely clear. I spent some time implementing code for calculating statistical properties of exponential distribution. It is not trivial and results don’t look very trustworthy. I haven’t had success with exponential distribution yet. Instead I’m using normal distribution properties for the time spent and the depth at the moment. After removing outliers, these numbers look very trustworthy.

Posted on and Updated on

An HTTP module basics and configuration

In the previous article I explained how modules of all types link into Nginx. Now let’s  look closer at the specifics of HTTP modules.

An HTTP module has the value NGX_HTTP_MODULE in its type field and the ctx field points to a global instance of a structure ngx_http_module_t: Continue reading “An HTTP module basics and configuration”

Posted on and Updated on

How your module plugs into Nginx

In previous articles I have deliberately omitted almost everything related to the question of linking your module with Nginx. It is important, however, that you know about it.

Let’s take a closer look at the metainformation that your module must contain, so that Nginx can initialise and configure it. Continue reading “How your module plugs into Nginx”

Posted on and Updated on

Working with cookies in a Nginx module

Imagine you run a PPC advertising campaign and you want to find out how many visitors coming from a search engine result in sales. We will create an Nginx module and use cookies for this purpose. Whenever a visitor clicks on your ad, a landing page is requested with a tracking argument in it. The tracking argument looks  like that: ‘?source=whatever’. We will put the content of tracking argument into a cookie that will be called a source cookie and write it into a log file. Whenever a visitor makes a transaction (e.g. buys an article or makes a booking), the name of the source will be recorded and we will be able to easily attribute every transaction to a source.

Let’s start with declaring a structure that will contain configuration of our module: Continue reading “Working with cookies in a Nginx module”

Posted on and Updated on

How to return a simple page

Let’s see what we need to do in order to return a simple page to a client using Nginx module. We need to generate a header and a body of the response. To send a response header we use function ngx_http_send_header:

#include <ngx_http.h>

ngx_int_t ngx_http_send_header(ngx_http_request_t *r);

The only argument r is the request for which you want to generate and send a header. This function serializes headers from the list of response headers r->headers_out into a buffer and sends this buffer to a client or queues this buffer into output queue r->out if the client socket is not writeable. The HTTP version in the status line will be determined for you automatically. The HTTP status code is taken from r->headers_out.status and the status text will be filled according to the status code.

Lets see how we can add some custom header line to our response header. Continue reading “How to return a simple page”