Sunday 22 February 2009

Thoughts about Agile Development

Not too many years ago RUP was the way to do things and now being agile is the latest "right way" to do things.

It all started with the The Agile Manifesto, which I think bears repeating:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration
over contract negotiation
Responding to change over following a plan.

That is, while there is value in items on the right,
we value items on the left more.



As principles go, this is a good one. In my experience it reflects how things are done in real working life - or at least how they should be done.

The important thing is not to ignore the last sentence: "while there is value in items on the right, we value items on the left more". In other words, the manifesto does not instruct to ignore practical processes, good enough documentation, proper contracts nor having a plan.

I used to work in a company that had taken the agile way and scrum in particular in to the heart. When I joined the company it was actually inspiring to see how not just the techies but the management and even the main customer made a real effort to learn and adapt to the scrum way of working. They were not just talking about it but also working on it. I do respect that.

Unfortunately it wasn't all good. Very few seemed to care about project documentation, because documenting things wasn't perceived as agile. Not true. What people might have been thinking was the old style waterfall process where everything was designed and documented up front any implementation work, but the alternative to that is not have documentation at all. Personally, I think it was just a convenient excuse for people being lazy and cut corners.

Neither was it very common to plan ahead since being agile was to do what was needed right then and there - in my mind to think this way is to be short-sighted. Unless the program in question is the ever-so-popular Hello World, rarely is an application complited in just a few iterations. Incremental iterations are step building towards hindmost goal so one must always be mindful about next couple of steps or the risk of major redesign and refactoring is likely to become an unfortunate fact. Now, surely that is not what agile way of working is supposed to be about?

When Product Owners and managers agreed on the priorities and geared up to get things done, it often became as a nasty surprise when Somebody Else managed to convince directors that their project was more important and should become a priority instead. This wasn't about "responding to change over following a plan", it was more about being without a direction. At one point a project might be a company top priority and then, without warning, be pushed aside and be replaced by another top priority project. I'm sure there were always seemingly good reasons for this, but how is it a good thing when projects go unfinished, professional employees get frustrated and nobody can rely on what was agreed?

Being agile should not be the same as being sloppy and erratic. It is a principle that promotes common sense over dogmatic thinking. About cutting the red tape, doing what needs to be done and getting it done in a professional manner.


Saturday 14 February 2009

Thoughts about Ideas

I believed in ideas.

Yes, for a long time I believed in ideas. I looked at the world around me, saw all the man-made things and knew that each and every one of them was first conceived as an idea. Ideas of form and function, sometimes born together, sometimes one following another. I believed in ideas thinking, knowing that ideas have the power to shape the world.

But I made an error: I was wrong in thinking, believing that ideas have intrinsic value. This was followed by another error when I began to think that ideas could be owned. So as I believed that ideas were valuable by their own right I began to think that I must protect them, hide them from others lest they would be stolen and taken away from me. Somebody else might realise them without me if I wasn't careful.

For a long time I was a fool. In all honesty, chances are that I still am, but perhaps for a different reason now.

I'd like to think that I now have a better understanding of ideas, of what they really are about. Lately there have been times when I have been thinking that yes, ideas do have intrinsic value but that value is no more than a value of unfulfilled promises. Perhaps that is a cynical way of thinking about it, but to be cynical about something is to belittle it; so instead I choose to think that yes, ideas are valuable and yes, they are needed to shape and to change the world, but they have no intrinsic value nor can they be owned: To try to hide ideas is to try keep the world from changing and that just is not possible.

Perhaps ideas should be seen as roadsigns: they point out places to go and give directions, but beyond that they do very little to help us to actually get anywhere. In the end what really matters is how ideas are implemented and how the implementations are used afterwards. For example, it is one thing to have an idea about a text processing application, a movie or a sculpture but realising them would be altogether different matter and it is the end-result that counts.

For a long time I tried to protect my ideas by hiding them from others, but obviously it made no difference. I think there are plenty of examples in the history of inventions showing that the same or similar ideas can be discovered independantly by several individuals roughly about the same time. It is as if an idea seeks to manifest itself when the time is right and increasing number of people just keep coming up with similar ideas until one of them eventually has the right skills, resources, contacts, timing and most importantly, luck to realise it before somebody else does. So it matters not if I get an idea and choose not to tell about it to others, somebody else is bound to come up with the same idea (and most likely already has) on their own: many ideas are born from a need solve a problem and when many people face the same problems and as very few people has a truly unique way of thinking, is it any wonder when similarly thinking people facing the same problem, come up with resembling solutions?

So, what next? I guess I still believe in ideas and I do still feel protective when it comes to ideas, but that is just an old habit taking its time to die. Perhaps the next time I come up with an idea I'll write about it in here. Perhaps I should get one of my notebooks from my bookshelf and see if some of the entries are interesting enough to be shared here (and believe me, while we all like to think that we recognise a good idea when we get one, I suspect that the truth is that most of them are better off forgotten).

After all, what do I have to lose?

Thursday 12 February 2009

Thoughts about Cloud Computing

Some time ago a company working on their own cloud data storage solution contacted me and asked me to look into their public material and just see if I might come up some thoughts. I spend a weekend of reading and thinking and this what I basically came up with. Mind you, that cloud computing, while really interesting topic, isn't my speciality. But then again, what is my speciality? After all, I'm just a software designer with tendency to think about stuff.

Data Security
Cloud data storages should not be considered secure as the users are not in control of server hardware nor their deep administration (available only for virtual environment’s administrators, service providers). The database server hardware is maintained in unknown locations instead of in the users own facilities where they are able to exercise physical and logical access control. It is a matter of trust that the host facilities are secure against outside intruders.

The same principle applies to the deep administration access rights. While users can be reasonably certain that other users cannot access their data, can they be certain that their data remains secure when it comes to host’s administrative personnel? How extensive are the deep administrative rights, how reliable are the host’s employees, and how secure is the service against hackers in general? For example, can it be guaranteed that the data is never transmitted over unsecure connections? Until these and other uncertainties can be resolved the conclusion should be that the cloud databases are not secure, at least, not when it really matters.

Solution: make the data itself secure. The obvious trade-off is having somewhat weaker performance, but for many users security becomes before performance. For these users it would be highly beneficial if improved security features were available when needed.

Encrypting the Data
By storing the data in encrypted form an additional layer of security can be created to protect the information content: while the data itself might be accessed by unauthorised parties they would still need to decrypt it before they can access the information.

In the Nexus (one of my not-so-successful projects, I might talk about more some other time) certain sensitive information was to be encrypted and the process is managed by the server application: when data is accessed it is decrypted in the memory and encrypted again when stored to database. For a service, such as cloud data storage system, it could be possible to implement similar feature as optional layer that encrypts / decrypts data as it goes through the layer. Obviously it would be trickier than it sounds, but it should be doable nevertheless.

Fragmenting Data Between Cloud Nodes
Data could be fragmented between two or more cloud nodes. This way even if the security of one node is breached the overall information is still more or less secure as the attacker would need to infiltrate all related nodes and then combine fragments in correct order. To make the protection even stronger the fragmented data could also be encrypted either before or after fragmentation.

Secure Connection Between Cloud Nodes
When using cloud data storage one should assume unsecure connections between database nodes, especially when overall database structure is partitioned to two or more separate sites. Typically it is up to the database and system administrators to setup secure connections between servers, but when virtual servers and cloud services are being used, users are not likely to have any control over these issues. In this case it might be very attractive if a cloud data storage system could provide secure data transfers by, for example, using SSH tunnelling if secure connections between cloud nodes cannot be otherwise guaranteed.

Operational Data vs. Off-line Backups
For many cloud users it would be tempting to think that storing their data in a cloud ensures the safety of their data. After all, isn’t their data replicated across two or more cloud nodes, so if one node becomes unavailable the data is still available through other node(s). However, to think this way could prove to be a fatal mistake.

The replicated data is operational. In other words, data changes in one node is replicated to all other nodes. If that operational data becomes corrupted in one node, the corruption is likely to spread across the cloud and if the user has no off-line backups, the user will find himself quickly in the world of hurt.

The operational data can become corrupted through human error, software bug or malicious attack. Cloud databases are vulnerable to SQL injections and other similar attacks just like traditional database solutions.

Similar and equally serious threat is that somebody decides to drop a table or a whole database: change in one node is replicated across the cloud in short order and the damage is likely to be permanent.

Backup the Data
To protect against operational data corruption and other forms of data loss, one of the best solutions is to take off-line backups often enough; everybody knows this (though surprisingly few does this). The classic approach is to backup data from one cloud node to external media, but while it is recommendable to do this, this approach does have its own set of issues.

In cloud environment and with a cloud data storage system off-line backups could be made more convenient. First of all, backups could be distributed to various nodes as long as they are kept separate from the operational data. Unless the cloud hardware / connections suffer massive failures the backups should secure. This would make the data backup and recovery quite easy for the users. It should also be possible for a cloud data storage system to mount a local DVD (or similar) drive for backup device and make it easier users to backup their data from the cloud.

Stored Procedures in a Cloud
Many applications utilise stored procedures and transfer some of the application logic on database servers. This makes sense for several reasons. As the data is stored in a cloud and can be read through any node, the same should be apply to stored procedures.

This is not without some issues. Ideally nodes would have similar resources and performance capabilities, but in practice this is not always so. As a result the service can demonstrate variable response times as some slower nodes would take longer to process requests. For example, a stored procedure might handle massive amount of data and if one node has less memory than other nodes, it could be so much slower that the request timeouts. Or even worse, the database crashes.

Nevertheless, replicating stored procedures to all cloud nodes is still preferable to having them all being processed on a single master node.

Distributed Stored Procedure Processing
Instead of handing processing requests over to individual nodes, two or more nodes could be temporarily combined into single virtual processing unit. This could be challenging, but when successfully implemented it could yield significant benefits for cloud users as it would be sufficient to have nodes with less resources (read: cheaper nodes), which in turn could translate into have more nodes available in a more extensive cloud.

A cloud data storage system could be extended to control distributed stored procedure processing, although admittedly greatest benefits would be enjoyed by services that continuously process large amounts of data on the database server. Today too many developers prefer to just load the information from the database on to their (server) application and process the queries outside the database. Many of them end up dealing with various performance and system resource problems.

Still, it would be pretty cool to have this feature available when one needs it.

Track User's Geographic Location and Network Response Times
Larger clouds are likely to be global or at least span over large geographic areas. When users move from place to place or a project has people working on it from different locations, the network response time can easily become a performance issue. In such case it could be desirable if the cloud could identify which node is closest to a user (distance measured in response time) and asynchronously replicate the data across the cloud while the user is working through the fastest node. If the node’s performance level drops below certain threshold the cloud could assign another, faster node to the user without the user ever noticing any difference.

Mobile Nodes and Stationary Nodes
In certain cases the cloud could consist of two types of nodes: mobile nodes and stationary nodes. The stationary nodes could be the primary data storages while mobile nodes could act as proxies between users and stationary cloud nodes when bandwidth is limited or there are security concerns (think military, but there are plenty of civilian cases as well). In these cases the users would query their local (mobile) node for data and if the requested data is available, great. If not then the mobile node forwards the query to closest (fastest responding) stationary node, which uploads the data to mobile node for later use. Naturally these mobile nodes would form a cloud of their own so the data uploaded to one mobile node is then replicated to all other mobile nodes within that local cloud.