Web Site Report – January 2008

Ready for another look at the hum-drum routine of an obscure web site? Here are the monthly highlights for christian-sauve.com:

1. Mmm. Numbers…

My prickly “Urchin” web stats engine tells me that…

Report for: christian-sauve.com, January 2008
Total Visitors       8,089
Total Pageviews     19,570
(Corrected total  11,344)
Total Hits          22,753
Total Bytes Transferred   441.8MB
Average Visitors Per Day  260.93
Average Pageviews Per Day 631
(Corrected average      366)
Average Hits Per Day      733.96

The “corrected” numbers take out the CSS, robots.txt, PDFs, mis-filed graphic files (ICO, GIF, JPG) and other non-public files mistakenly considered “pages” by the statistics pre-digestion engine. All numbers are roughly the same as last month.

But wait! This month, we’ve got another opinion. As a sop to nebulous “new year’s resolutions”, I decided to experiment with Google Analytics, installed the code, braced myself for bad news and peeked at the results at the end of the month.

At first glance, the results for January are catastrophic:

Google Analytics Data, December 2007:

  • Total Visits: 962
  • Total Pageviews: 1,439 (No correction necessary)

Ouch! What the heck just happened here? Is Google Analytics mad, malicious or just plain nuts?

Well, you’ll have to sit down through a few paragraphs of technical explanation to understand what’s going on:

My “Urchin” web stats report generator is what’s known as a “log file analyser”, looking at the information collected by my web server to figure out what’s going on. This is a valid approach (in fact, it’s a rock-solid way of looking at what the site is doing), but it does catch a lot of information that isn’t completely relevant to webmasters: It makes few differences between human visitors and robots from search engines and spammers, for instance. Worse: its support for the concept of “visit” is based on assumptions and approximations. Meanwhile, Google Analytics works by embedding a small amount of Javascript code on each page, code that refers to the Google site and provides more accurate information for those human visitors with Javascript-capable browsers. That necessarily means that Google Analytics will capture less data. On the other hand, what data it does capture will be richer than what’s recorded by Urchin.

Additionally, you have to remember that my version of Urchin was last updated in 2002. Interestingly, the company working on Urchin was then bought by Google and (after another merger) became Google Analytics in late 2005. Being fully centralized, Google Analytics is constantly being improved, and a major update took place in November 2007. This becomes important when considering recently-introduced user agents or the usage pattern of newer media such as blogs.

All of which to say that they are important differences between one and the other product. To investigate those differences myself, I grabbed a single day’s worth of web logs and started crunching numbers for comparison. “My” numbers for identifiable human visitors were about double that of Google, and 40% of what Urchin was telling me (with robots and spiders and everything). So there’s a lot of salt grains to be taken when considering the exact numbers reported by Google Analytics. Other other hand, trends and orders of magnitudes and information that’s not to be found in Urchin can be valuable if considered carefully… and it’s in that spirit that I’ll be comparing both set of results.

(But don’t expect me to get rid of Urchin, or web logs. In many cases, such as finding out what spammers are doing on the site, they offer information that will never be captured by Google.)

All of this being said, our top ten most popular pages according to Urchin are:

/index.html                    625
/texts/free-movie-tickets.htm  372
/reviews.html                  158
/about.html                    134
/contactt.html                 132
/reviews/2000/books00c.htm     120
/texts/solaris-explanation.htm 106
/reviews/1996/books96b.htm     105
/reviews/2002/books02d.htm     101
/search.html                   100

This is more or less the same ranking that we’ve seen for months. But let’s see what the human users tracked by Google Analytics are looking at:

1. /index.html 127
2. /reviews.html 109
3. /reviews/index.html 81
4. /texts/solaris-explanation.htm 53
5. /francais/index.html 43
6. /search.html 39
7. /texts/100films.htm 35
8. /about.html 33
9. /writings.html 33
10. /reviews/movies/2002.htm 29

Ignoring, for the moment, the humiliation of results that are a third of what Urchin is reporting, the slight differences here are fascinating. Google-tracked human users go for reviews and the review index. The “Solaris Explained” page is still popular (though the bounce rate of 94% is ferocious as users look at the page and feel no need to go exploring the rest of the site.) The contact page is practically ignored by human visitors, which confirms my suspicion of heavy spam spider activity. The Google Analytics results pass “real world” evaluation: I can believe, maybe more easily than the Urchin results, that those would in fact be the most-visited pages on the site.

If you care about such things, (and who would not?), here’s a look at browser statistics for the month (by visitors, last month’s results in parentheses):

Netscape|6  4701 (4128)
Explorer|7   965 (923)
Explorer|6   753 (1041)
msnbot|1     307 (261)
Explorer|5   148 (new)

Little change here. I’m guessing that a few people got new computers with IE7 over the holidays…

But Google Analytics offers another view:

1 IE 7.0 331
2 IE 6.0 240
3. Firefox 2.0.0.11 232

Dramatically different, isn’t it? A good thump to Mozilla triumphalism, right? But this shouldn’t be surprising: Most Netscape|6 hits, after all, are from the same spiders and robots that Google Analytics excludes from its calculations. Again, I have the feeling that Google Analytics (which is regularly updated with new user-agent information) is far more accurate in terms of what human visitors are actually using.

One Google Analytics report that I found unexpectedly fascinating is the “Bounce” data telling me how many visitors look at only one page, and then leave. Bounce isn’t necessarily bad: For pages that are popular with search engines, such as my “Solaris Explained” page, it’s perfectly OK if people come in, are enlightened and leave without looking at the rest of the site. Ideally, though, I would want them to stay for a while… but life’s short for everyone. In any case, I found that according to Google Analytics, most of my top-level pages had acceptable bounce rates, whereas some of my popular pages (such as the “Solaris Explained” page) had bounce rates in the eighties and nineties. As expected, really.

2. Where do these people come from?

Our top five sources of referrals (in visitors) were

google.com/search      966 (1021)
www.google.ca/search   254 (264)
google.co.uk/search    107 (112)
live.com/results.aspx   81 (new)
google.com/books        72 (new)

Interesting appearances of both live.com (the new Microsoft search engine) and of Google Books.

As you may expect by now, Google Analytics has a slightly different view of the situation:

1. google / organic 669
2. yahoo / organic 25
3. aol / organic 13
4. entropypump.wordpress.com / referral 10
5. books.google.com / referral 9

(Lingo key: “Organic” is Google’s way of saying that no one has paid for ads leading back to christian-sauve.com on those search engines. “Referral” is a direct link to this site.)

Keeping in mind that Google Analytics is optimized for maximizing Google Ad-Buys, there are a lot of interpretations built into the Google Analytics numbers. I suspect that all national Google sub-sites are aggregated together, and that a lot of number-crunching ensures that the data is “purer” than what can be deduced from server logs. Of course, Google Analytics provides me with a lot of extra information that Urchin doesn’t, such as “bounce rate” (people who only visit one page), “average time on site” (hocus-pocus calculation based on multiple page requests) and “new visits” (based on client-side cookie information)

In collecting referal information, Google Analytics seems noticeably stingier than Urchin. But keep in mind the “only tracking (most) human visitors” nature of its statistics: By nature, it’s built to miss a chunk of referals.

On the other hand, it does deliver very detailed information on the visits it does capture: Thanks to the Google Analytics data and some good old-fashioned number-crunching in Excel, I was able to build a bubble-chart (Using Bounce rate, Pages per visit and number of visitors as my data axes) that revealed that my “best referrals” are coming from Entropy Pump: People coming from that blog (10) visited an average of ten pages per visit (!) and only had a 20% bounce rate.

Visitors Bubble Chart

Big Blue Google, on the other hand, performed worse than the all-referrals average, sending me visitors that bounced more often and visited fewer pages per visit. (The “best” search engine, according to those metrics? Microsoft’s Live, which sent a tiny but relatively more curious bunch of visitors.) My collaborative blog, Fractale Framboise, also did well. Direct Traffic was also noticeably “better” than average . Which does actually smells like reality: People coming from Entropy Pump and Fractale Framboise are my target audience, and people directly coming to this site, presumably via bookmarks, are already familiar with the content and looking for more.

And this, frankly, goes straight to the heart of what web statistics are supposed to accomplish: Provide insight as to the nature of the web site’s visitors. Google delivers truckloads of visitors who aren’t interested in looking for more? Logical. Specialized blogs delivering pre-interested visitors? Sounds like an insight that can lead to further action!

In fact, it’s as I was contemplating Google Analytics data that I had either a revelation or a mini-stroke of insanity: If my review navigation pages are popular and if my readers are coming from review blogs, doesn’t it make sense to convert said review section to a more manageable blogging infrastructure? With the possibilities inherent to blog content management, RSS feed updates, specialized search engines and regular updates pressure, wounldn’t it be a better site if I dumped everything into a blog?

Why yes, it would be. I spent years resisting the allure of transforming this site into a blog, and it took a free analytics tools to convince me that it would be the way to go. And it meshes with a few nasty suspicions about my own work: A blog would allow harsh reader feedback, demand more regular updates, force me to write to a wider audience, push me in the spotlight of reader attention, and simply force me to step up my efforts.

Inspiring, isn’t it?

Of course, there are tons of things to do until then, not the least of which will be to dump eleven years’s worth of reviews into a back-dated database, create a template, come up with a tags-and-categories navigation architecture, catch up to the backlog and fiddle with the blog configuration. And once it’s up, I’ve got to feed the machine regularly. Eek.

So don’t expect any major change until this summer. But the seed of the idea has definitely been planted, and I’m off to investigate the possibilities of a newly-bloggish infrastructure. Keep reading these Site Reports for further updates.

Google Analytics tells me a bit more than Urchin about who those visitors are. For instance, it attempts to detect geographical location. Few people will be surprised to learn that most visitors come from the United States, followed by Canada, the UK, Ireland and Australia. Most people use Windows, followed by Macintosh then Linux. (But there was one iPhone visitor!) Most people have a 1024×768 screen resolution. Most people have FLash 9. Most people have Java. Most people connect using cable or DSL. While I don’t really trust the exact numbers, the aggregation seems reasonable to me. Trend analysis, once we have even more numbers, will be more important than precise numbers.

In the spirit of Web Analytics, here’s an amusing new link to this site: quantcast.com says, about christian-sauve.com, “This site reaches fewer than 2000 U.S. monthly uniques. The site caters to a primarily older, highly educated, rather male audience.” Eh, fair enough.

3. Ohh! Visitor comments!

Nothing worth sharing in the January mailbox. (It’s been a slow month.)

4. Search Queries Oddities

Here are our top-ten queries:

>patricia pearcy nude       19
>being canadian             12
>ayn rand                   11
>christian sauve            11
>movie sneak previews       11
>christian                  10
>solaris explained           9
>frank camper                8
>free movie premiere tickets 8
>that bringas woman          8

Meh. It’s the same old, same old!

But as it happens, Google Analytics has a different view on the month:

1. christian sauvé 9
2. that bringas woman 8
3. frank camper 8
4. solaris explained 6
5. solaris explanation 4
6. fuel injected dreams 3
7. glenn kleier 3
8. sequel to teeth of the tiger 3
9. solaris+ending 3
10. teeth of the tiger sequel 3

Some familiar search queries here, and results that don’t exceed the Urchin equivalent numbers, though some Urchin favourites are nowhere to be found here. Once again, I’m inclined to consider the Google Analytics numbers to be generally closer to meaninful reality than the Urchin ones. Speaking of reality, it helps that Ayn Rand is not listed in the Google Analytics numbers.

Until next time, my name is Christian Sauvé and I remain… obsessed by web statistics.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>