This past weekend, I installed an energy efficient tankless hot water system in my house. I decided to install it myself, propelled by marketing promises that the unit would be "easy to install".
On the way back from my sixth visit to Lowes in two days, it occurred to me that any meaningful home project inevitably requires a minimum of three visits to the hardware store, followed by one additional visit to return all the parts you bought but didn't use.
During the drive, It struck me that solving the "Big Data Problem" vaguely paralleled my latest home improvement project in that one single trip to the hardware/software vendor isn't likely to solve your problem. And at the end of the experience, you're likely to have a box of things that you thought you'd need, but turned out not to fit.
Of course it's pretty easy to return things to your local home improvement store and get at least a store credit...try THAT with a hardware/software vendor.
Anyhow, there are a slew of interesting software technologies out there - Hadoop, columnar data stores, MapReduce, highly parallel algorithms leveraging multi-core, numerous technologies involving tightly coupled and loosely coupled parallelism, data manipulation and processing based on extended set mathematics, etc. - almost too many to mention.
And in the end, the big nasty problems will probably require solutions involving multiple trips to the technology store.
The upcoming series of postings will be devoted to an examination of the various technologies which are purported to solve the Big Data Problem. I'll be discussing their strengths and weaknesses, as well as the types of problems for which they are most suitable.
First on the list - Apache's Hadoop framework. There must be a reason for all the publicity it's been getting.













Leave a comment