Horizontal Partitioning

So how do you store a lot of data if there is already over your head? The simplest answer is: partition horizontally, in other words divide and conquer. In horizontal partitioning each data instance is stored on a single storage node, so you can increase your total capacity by adding more nodes. For horizontal partitioning […]

Comments are back!

Long time ago I disabled comments because I was getting a lot of spam. Apparently this was man-made spam, because they did go through CAPTCHA. Today I installed Social and also added Twitter and Facebook accounts. Please comment and follow!

Reading list: Foundations of Statistical Natural Language Processing

Recently I was looking into making my NLP knowledge more solid and I found this book by reference: Foundations of Statistical Natural Language Processing. It’s a classic book and certainly it was a good read. Now, the topics it discusses might sound quite theoretical, so let me translate them to few examples how each of […]

Scalability: is your problem WORM or RW?

I wanted to write an article about secrets of scalability, but it appears that this subject is too complex for one article. Instead let’s just dissect some scalability problems as we go. When you think about scalability, it is important to distinguish two different types of problems: those that require reading much more often than […]

“Multi-armed bandit” A/B testing optimality proved?

Correct me if I’m wrong, but it seems that this paper proves optimality of “multi-armed bandit” approach to A/B testing. The latter one was described in this post earlier this year. For those who do not understand what it is about: A/B testing requires investment in the form of sample size (usually it is equal […]

Counting unique visitors

In version 1.0.2 of redislog module I added a feature that allows you to do conditional logging. What can you do with it? For example, logging only unique visitors. E.g.: userid on; userid_name uid; userid_expires 365d; access_redislog test “$server_name$redislog_yyyymmdd” ifnot=$uid_got; $uid_got becomes empty whenever a visitor doesn’t have an UID cookie. Therefore, this directive effectively […]

Better logging for nginx

Somehow the problem of logging was not completely addressed in nginx. I managed to implement 2 solutions for remote logging: nginx socketlog module and nginx redislog module. Each of these modules maintain one permanent connection per worker to logging peer (BSD syslog server or redis database server). Messages are buffered in 200k buffer even when […]

Configuration directives

In one of the previous articles I discussed the basics of HTTP modules. As the power of Nginx comes from configuration files, you definitely need to know how to configure your module and make it ready for variety of environments that users of your module can have. Here is how you can define configuration directives.

Book review finished!

Yesterday I received my copy of Nginx 1 Web Server Implementation Cookbook. This book is a joint effort of several Nginx enthusiasts. I am proud to be one of the reviewers. Nginx 1 Web Server Implementation Cookbook is extremely useful for people who just want to know how to get going with Nginx. It contains […]

Measuring time spent on page

One of the challenges of A/B testing is insufficient observations due to low traffic. In other words, if you measured the conversion rate on our web site, it would take months or even years before we’d get conclusive result. What you can try to measure are microconversion and microobservations. That’s what I was up to […]