Companies are experimenting more and more with petabytes of unstructured data like social media entries, customer support emails, complaints, suggestion boxes, etc. These are increasingly using Hadoop, MapReduce and NoQSL distributed databases like MongoDB, Casssadra, Riak and so on.
On the other hand, you have relational databases holding lots and lots of structured data like transactional data in their humungous data warehouses. Data warehousing efforts in many organizations are maturing rapidly and with storage getting less and less expensive every year, more data can be stored and used.
How do you build unified business intelligence that combines all of the older transactional, structured data with unstructured but never the less, very valuable data?
Pentaho seems to have come up with some very slick answers.
They have tools that simply pick up and merge Hadoop data with other structured and unstructured data from relational databases, data warehouses and new distributed databases that store unstructured databases such as MongoDB and Cassandra.
The nice thing about this approach and toolset is that you don't need to muck around with highly technical stuff such as MapReduce to be able to do business intelligence.
Merging legacy transactional data with unstructured data may help organizations answer questions such as "What are our top regions for sales of widget X and what do people say about our product or service from that region?" or "How do negative perceptions of product X correlate with the changes in sales of product X in region Y?"
Being able to merge these two kinds of data quickly is very crucial and is a huge pain point in business intelligence these days.
I'm sure that once you have ways of combining structured and unstructured data all kinds of fancy business intelligence can be extracted.
Pentaho has done a terrific job of leading the market in these kinds of efforts--very slick and useful.
Companies such as Google, Facebook, Yahoo and others have made tremendous contributions in storing and retrieving unstructured data in an efficient and constantly available manner, using distributed computing and the cloud with Hadoop and mapReduce, and now they are ready to be used commercially in other organizations.
It's a big gap that Pentaho is bridging.
But which is the stone that supports the bridge? - Kublai Khan