What follows is my podcast with Chris Piedmonte, CTO of Algebraix. We discuss the challenges organizations face when trying to implement analytics on massive volumes of data. Chris will offer his insight on why the relational model is insufficient for today's data needs, and how Algebraix can help.
Listen to my 9:55 podcast below:
KB: Can you please provide us with a brief overview of your company?
CP: I'll be happy to. Well, Algebraix Data Corporation was started to develop data management software around a new means of modeling data mathematically. The math we're using is a collection of algebras that was developed based on the idea that information is best represented by binary pairs of information not single values. And that the ability to manipulate the structure of the data is just as important as the ability to manipulate the values when it comes to designing and building systems that are efficient, easy to use, and can deliver a high value to our customers.
The concept actually goes back to the 1960s and it's the same basic concept that the relational data model was based on but our approach is for more mathematically rigorous and sound than the approach that was originally taken by Ted Codd back when he was doing his original work on the relational data model. By applying the mathematics, we've been able to develop a new technology for storing and manipulating data and we call the technology Algebraix Company, Algebraix Data Corporation because it's based on this collection of algebras we've created here at ADC.
The math itself has been very useful in solving a lot of problems that we've seen in the relational implementations that exist today. For example, the software itself is capable of self-managing, and self-organizing, and self-optimizing based on an application of the mathematics we've developed. What that does for our users is they no longer need to worry about deciding on data structures, indexes, partitioning, and how to allocate storage for specific types of databases or for how information should be served up to the users. All of this is automated through the application of the mathematics we have.
KB: Today's topic is The Challenges Organizations Face on Trying to Implement Analytics on Massive Volumes of Data. So Chris, your company has created a very different way of managing data. What's wrong with the relational model and why do you think people need something else?
CP: Well first off, it's not that there's really anything wrong with the relational data model. During the 70's and 80's, the relational model was popularized because it solved some problems. It allowed for standardization, it provided a means of organizing typical business information and business data in such a way that could be easily stored, and retrieved, and integrated with applications. As you know today, the relational database is the standard for information management and it's a multi-billion dollar market at this point but it's not suited to all applications. The internet, the power of computer today and plethora of new applications that do things other than manage tabular business data has caused creation of new data structures and new means of using information that's more diverse and the relational data model wasn't designed to handle that kind of thing.
This can include hierarchal databases, graphs, audio, video, even business documents today are far more complexed than what you can easily model with a row and column type of store. To store this information relationally, all this new information, is difficult. You got to go through all kinds of unusual and unnatural acts to get the information into these relational data structures. Once you've done that, then the SQL Language and other techniques used to access the information are difficult for the users, they're complicated, they're hard to maintain, all of this just creates a lot of manual work that has to be overcome through efforts and considerable amount of time and money.
It can be very inefficient in its execution as well. Systems can become very slow and very unresponsive when they have to deal with these complex non-relational type structures within the relational databases. Our approach is different. Rather than require the DBAs to design and tune these relational data models, our software automates the process. While the software is running, it analyses the data, the queries that the users are presenting and then automatically creates new structures and new access methods to accelerate the performance and streamline the users experience accessing the information through their business applications.
Because of this, a vast majority of the work that people have to do to bring up a database design the logical system and the physical data model, build indexes, decide on partitions, decide how the information will be spread across various disks and various systems. They're all automated, all eliminated by our software and automated. The advantage of course is the reduction in cost and time to bring up a database resulting in what we hope to be a 10:1 cost performance advantage.
KB: You mentioned the relational model first became popular in the 70s and 80s. Was the relational model always broken? If it worked back then but doesn't now, what's changed?
CP: Well, as I said earlier, it's not that it was broken but simply insufficient for today's data needs. The overall concept that Ted Codd proposed is a great model for tabular business data for transactional information but it only goes so far to specify a logical model. It doesn't specify how you should implement the storage of this and the retrieval and the access methods used to get at this information. It's a way of thinking about data and describing how to manipulate it logically. I guess that's the point I want to make. The physical implementation and how the data is physically managed is really the key to providing efficiency and high availability of information.
The real problem isn't the relational model's broken; it's the implementations that are in need of some help. In general, every relational database company has some techniques, some trick, some way of doing things that's unique to them that allows them to claim some superiority in some application. The best example of course is Oracle who's transactions processing technology clearly has been the industry leader for some time. But in the 80s, other companies like Teradata, for example, sprung up to address the problem of doing analytics. Their solution was to use very specific hardware and access techniques to accelerate performance for analytic work but they couldn't do the transactional work the Oracle did.
In almost all of these cases, the solution is created by creating custom structures and access methods designed for a particular purpose. Really, there's no such thing as a general purpose relational database. As I said to make things worse, things have changed a lot since this work was done in the 70s and 80s. In the 90s, the internet came along, computers got more powerful, we were able to store a lot more information. The network connectivity allowed us to move all that information around or access information at remote locations. XML came along, audio, video, all of these things and none of this existed when the relational data model was conceived. None of these existed when the first implementations of the relational data model were built.
So it wasn't designed for these things and that's one of the reasons the relational model today is not capable of serving all of the business needs of your typical enterprise company today.
KB: This format of rows, columns, and tables is now incorporated into the very foundation of almost every single database in the world. Will people ever be able to change it?
CP: Well of course, eventually things will change, they always do. It's not so much of a question of do we need to change it or will it change. But as we moved away from less efficient ways of doing things in the past to better ways of doings that will happen as well as new systems come along. The traditional row and column relational database will always have its place but when it comes to managing new types of information and new types of data structures, a different technology and a different approach is required.
Integrating all this information is key too and that's where we think one our strength is going to be. With our mathematical model, we're capable of modeling in the relational database model, hierarchal models, XML, many other ways of organizing information into a single unified database. And I do think it's that ability to do that's going to make our technology unique and valuable in the future to be able to co-exist with all the existing legacy applications to provide the kind of data services and integration necessary to pull all this stuff together and truly provide a universal, integrated model is what ADC is all about.
KB: Let's talk about some examples. What kinds of things aren't possible today and why do they need to be?
CP: Well, there's a lot of things where the relational data model in its implementations fall short today. It can be as simple as having two different applications that want to do two different things accessing the same database. Perhaps maybe a customer service application and some type of order fulfillment system for example might want to get access to the same data to run these applications. The challenge always is that the nature of the application dictates how the information should be stored and structured. And the way its organized and structured for one application may not be beneficial for the other so it's entirely possible that you get an implementation that's optimized for neither or that has been optimized for one application and not the other.
This is a problem and this problem exists because of the nature of how these systems are done today. You are only allowed one physical model for the logical model that you are exposing. In our technology, its different. You can actually have multiple physical models. Each automated and managed by our software such that both of these applications will see the same information but will access it through entirely different access methods and get at different data structures to provide the kind of efficiency and performance that they need.
Other types of issues that come up are just the sheer amount of data. If you look back at databases in the 70s and 80s, managing megabytes of data was what was going on. Today, most companies are dealing with gigabytes if not terabytes of information. These systems weren't designed for those things. The nature of that type of processing is very different. And once again, new technology is required.
Finally the internet, the fact that information is distributed. Networks per say didn't exist back when the relational implementations were created. There wasn't the idea that information could be spread across a computer network within a building much less around the world. Databases that have the ability to deal with information distributed in that way and allow users to pose queries, answer questions where the information is not local or perhaps not even all in the same place is what's going to be necessary for the databases of the future. Once again, we here at ADC and the technologies we're developing will be able to address those problems.