Friday, 15 April 2011

Thoughts about Data and Information

Question: What is information?

The way I see it, information is data relevant to a given context. In other words, if data can provide an answer to a question, it is information; otherwise it has very little practical value.

Consider an online service that is quietly collecting data about network traffic and user behaviour. Having oodles of data in log files or in database does little good on its own. It is only after someone begins to ask context specific questions that the data begins to acquire practical value by virtue of providing answers.

From this follows that logging and other methods of gathering data may turn out to be waste of time and resources (which often equals to money) if there do not attempt to provide answers to questions. In other words, when designing a service one should also be mindful about e.g. what is being logged and why.

I've seen too many systems that have their logs disabled in production because of the amount of data they collect every day: constant writing in a log file consumes limited disk space and even slows down the overall service performance. So when something goes wrong in the production there are no records because logs are only used by developers and testers to debug the system before going live. And even when logs are enabled in production it often turns out that the data that might be relevant is not being logged at all.

So before implementing data gathering of any kind, someone should think about what questions are most likely being asked when the service is in production and then consider what data would be relevant within that context.