Thursday, 12 February 2009

Thoughts about Cloud Computing

Some time ago a company working on their own cloud data storage solution contacted me and asked me to look into their public material and just see if I might come up some thoughts. I spend a weekend of reading and thinking and this what I basically came up with. Mind you, that cloud computing, while really interesting topic, isn't my speciality. But then again, what is my speciality? After all, I'm just a software designer with tendency to think about stuff.

Data Security
Cloud data storages should not be considered secure as the users are not in control of server hardware nor their deep administration (available only for virtual environment’s administrators, service providers). The database server hardware is maintained in unknown locations instead of in the users own facilities where they are able to exercise physical and logical access control. It is a matter of trust that the host facilities are secure against outside intruders.

The same principle applies to the deep administration access rights. While users can be reasonably certain that other users cannot access their data, can they be certain that their data remains secure when it comes to host’s administrative personnel? How extensive are the deep administrative rights, how reliable are the host’s employees, and how secure is the service against hackers in general? For example, can it be guaranteed that the data is never transmitted over unsecure connections? Until these and other uncertainties can be resolved the conclusion should be that the cloud databases are not secure, at least, not when it really matters.

Solution: make the data itself secure. The obvious trade-off is having somewhat weaker performance, but for many users security becomes before performance. For these users it would be highly beneficial if improved security features were available when needed.

Encrypting the Data
By storing the data in encrypted form an additional layer of security can be created to protect the information content: while the data itself might be accessed by unauthorised parties they would still need to decrypt it before they can access the information.

In the Nexus (one of my not-so-successful projects, I might talk about more some other time) certain sensitive information was to be encrypted and the process is managed by the server application: when data is accessed it is decrypted in the memory and encrypted again when stored to database. For a service, such as cloud data storage system, it could be possible to implement similar feature as optional layer that encrypts / decrypts data as it goes through the layer. Obviously it would be trickier than it sounds, but it should be doable nevertheless.

Fragmenting Data Between Cloud Nodes
Data could be fragmented between two or more cloud nodes. This way even if the security of one node is breached the overall information is still more or less secure as the attacker would need to infiltrate all related nodes and then combine fragments in correct order. To make the protection even stronger the fragmented data could also be encrypted either before or after fragmentation.

Secure Connection Between Cloud Nodes
When using cloud data storage one should assume unsecure connections between database nodes, especially when overall database structure is partitioned to two or more separate sites. Typically it is up to the database and system administrators to setup secure connections between servers, but when virtual servers and cloud services are being used, users are not likely to have any control over these issues. In this case it might be very attractive if a cloud data storage system could provide secure data transfers by, for example, using SSH tunnelling if secure connections between cloud nodes cannot be otherwise guaranteed.

Operational Data vs. Off-line Backups
For many cloud users it would be tempting to think that storing their data in a cloud ensures the safety of their data. After all, isn’t their data replicated across two or more cloud nodes, so if one node becomes unavailable the data is still available through other node(s). However, to think this way could prove to be a fatal mistake.

The replicated data is operational. In other words, data changes in one node is replicated to all other nodes. If that operational data becomes corrupted in one node, the corruption is likely to spread across the cloud and if the user has no off-line backups, the user will find himself quickly in the world of hurt.

The operational data can become corrupted through human error, software bug or malicious attack. Cloud databases are vulnerable to SQL injections and other similar attacks just like traditional database solutions.

Similar and equally serious threat is that somebody decides to drop a table or a whole database: change in one node is replicated across the cloud in short order and the damage is likely to be permanent.

Backup the Data
To protect against operational data corruption and other forms of data loss, one of the best solutions is to take off-line backups often enough; everybody knows this (though surprisingly few does this). The classic approach is to backup data from one cloud node to external media, but while it is recommendable to do this, this approach does have its own set of issues.

In cloud environment and with a cloud data storage system off-line backups could be made more convenient. First of all, backups could be distributed to various nodes as long as they are kept separate from the operational data. Unless the cloud hardware / connections suffer massive failures the backups should secure. This would make the data backup and recovery quite easy for the users. It should also be possible for a cloud data storage system to mount a local DVD (or similar) drive for backup device and make it easier users to backup their data from the cloud.

Stored Procedures in a Cloud
Many applications utilise stored procedures and transfer some of the application logic on database servers. This makes sense for several reasons. As the data is stored in a cloud and can be read through any node, the same should be apply to stored procedures.

This is not without some issues. Ideally nodes would have similar resources and performance capabilities, but in practice this is not always so. As a result the service can demonstrate variable response times as some slower nodes would take longer to process requests. For example, a stored procedure might handle massive amount of data and if one node has less memory than other nodes, it could be so much slower that the request timeouts. Or even worse, the database crashes.

Nevertheless, replicating stored procedures to all cloud nodes is still preferable to having them all being processed on a single master node.

Distributed Stored Procedure Processing
Instead of handing processing requests over to individual nodes, two or more nodes could be temporarily combined into single virtual processing unit. This could be challenging, but when successfully implemented it could yield significant benefits for cloud users as it would be sufficient to have nodes with less resources (read: cheaper nodes), which in turn could translate into have more nodes available in a more extensive cloud.

A cloud data storage system could be extended to control distributed stored procedure processing, although admittedly greatest benefits would be enjoyed by services that continuously process large amounts of data on the database server. Today too many developers prefer to just load the information from the database on to their (server) application and process the queries outside the database. Many of them end up dealing with various performance and system resource problems.

Still, it would be pretty cool to have this feature available when one needs it.

Track User's Geographic Location and Network Response Times
Larger clouds are likely to be global or at least span over large geographic areas. When users move from place to place or a project has people working on it from different locations, the network response time can easily become a performance issue. In such case it could be desirable if the cloud could identify which node is closest to a user (distance measured in response time) and asynchronously replicate the data across the cloud while the user is working through the fastest node. If the node’s performance level drops below certain threshold the cloud could assign another, faster node to the user without the user ever noticing any difference.

Mobile Nodes and Stationary Nodes
In certain cases the cloud could consist of two types of nodes: mobile nodes and stationary nodes. The stationary nodes could be the primary data storages while mobile nodes could act as proxies between users and stationary cloud nodes when bandwidth is limited or there are security concerns (think military, but there are plenty of civilian cases as well). In these cases the users would query their local (mobile) node for data and if the requested data is available, great. If not then the mobile node forwards the query to closest (fastest responding) stationary node, which uploads the data to mobile node for later use. Naturally these mobile nodes would form a cloud of their own so the data uploaded to one mobile node is then replicated to all other mobile nodes within that local cloud.