May 16, 2008   Sign In |  About ebizQ |  Contact Us |  Join ebizQ Gold Club
Print this article    Email this article    Talk Back!    Write to Editor

CopperEye Takes a Hard Look at Data Storage

08/17/2007

First Look with CopperEye's Kate Mitchell


Listen to the entire 17:26 podcast Download file

Gian Trotta: Welcome to another "First Look" podcast. I'm your host, ebizQ's Gian Trotta. A recent ebizQ Webinar guest estimated that by the year 2010, the world's data volume will reach one zetabyte. That's 1 followed by 21 zeros. And companies that seek to mine and refine this for business or compliance solutions will face a daunting and expensive task. With us today is CopperEye CEO Kate Mitchell to define the problem and delineate some solutions.

Welcome, Kate and thanks for joining us.

Learn more about SOA in Action


KM: Nice to be here, Gian.

GT: Kate, what IT complications and additional costs are associated with this explosion of data?

KM: Part of the challenge is being able to find the exact data that you're looking for as the data volumes are growing so dramatically. As companies have been continuing to increase the size of the database they are adding larger hardware platforms so that performance remains constant, both on the transaction side and the analytical side. But the cost of that hardware and the footprint in the data center, along with the power requirements, are getting to the point where it’s no longer possible to continue just to do business the old-fashioned way. And furthermore, when databases get to the multiple terabyte size, the number of highly skilled DBAs that you need is another major challenge -- finding those people as the data volumes are growing so dramatically is a challenge.

GT: Kate, can investing in traditional solutions keep pace with the data explosion?

KM: In some industries, that is the case. But actually, in others -- especially if there is a business discontinuity like the advent of 3G in wireless telecommunications or RFID whether you look at it from the supplier's side or the retailer's side. You can no longer just throw money at the problem. The explosion in data is so dramatic. Furthermore -- something that all industries face is regulatory issues where, in many cases, you're being required to keep a lot more data for a lot longer, and on top of that, you're being asked to provide very specific responses to inquires quickly. It's no longer, a 30-day response or a one-week response. In many cases, it's 24 hours or less.

GT: Right. So do you see compliance as a major driver -- or the need to establish some kind of business agility and advantage?

KM: It depends again on the industry. Many industries have both of theose pressures where they're trying to improve their service levels. They're trying to find a way to be more competitive with a focus on improving business operations, which is the primary driver. And then in other industries, it's the regulatory and compliance pressures that are primary in terms of trying to find a way to manage this explosion of data, without giving up the access to the very specific data that you're looking for. 

GT: Kate, what is the experience of particular industries of verticals in this regard?

KM: We've spent a lot of time at CopperEye focused on the telecommunications sector and they certainly are heavily regulated and specifically have a need for, in this case, aiding to combat terrorism - to find very specific information on individuals that are potential criminal suspects. To be able to find calls they made, when they made them, where they were, who they called, how often they called. That's on the regulatory side.

On the business side, you have things like 3G -- third generation wireless impacting the amount of data that they are dealing with. In many cases, the estimates are 20 to 30 times the volume of data to handle all the unique services on top of what cell phones are now being used for such as audio and video and online shopping, and so forth. So, I think that's an example of an industry, which is really feeling the pinch from both sides. Financial services would be quite similar.

GT: Those are good examples…

KM: You know what's interesting; it's not just the wireless providers. This is also a requirement for the Internet service providers - having to keep, every Web site visited along with the IP addresses. For the EU, that doesn't come fully into play until 2009. It's the telecommunication providers that need to have their plans in place by September of this year. But it’s going to be far more complex when we talk about IP addresses that have to be tracked for months or years.

GT: You know, ISP and wireless carries -- how much price pressure is this putting on them? That will in turn be passed on to my end? Do you think CopperEye can really help offset this price pressure?

KM: We generally are at least 75 percent less costly, all things considered, hardware and personnel and software, than a relational data base approach. And we found in the first couple of years that we were dealing with the wireless providers, that was a reasonable ROI to get us in the door. What we have found more recently, just in the last six months or so, is that the big companies are saying to us -- "it's not even a consideration for me because of the volume, to use a relational database. To get the performance that I would need, the cost is prohibitive."

And more importantly now, the time to implement because of the deadline coming up, the complexity of a relational database is a major issue -- we know of one particular case with a very large wireless provider who has been working for more than a year with Oracle. You've got the ongoing software costs; you've got the hardware costs. And on top of that it's been more than a year of professional services to try to make this work properly. That doesn't fly these days. And especially if there is an alternative, even if it's from a company that's not well known as long as we can prove it in a few days, even with a few billion rows of data and we can get a customer in production in a few weeks. That just changes the economics so dramatically. The old-fashioned way of doing things -- of throwing money at it, is not even, in many cases, a consideration -- so. That's good news for us.

GT: It is. And for everyone, as in life…

KM: That's right!

GT: …It's not enough to be rich. It helps to be smart as well as rich.

KM: That's right!

GT: Okay. How does a company like yours design a solution that can scale with this data explosion?

KM: The traditional way is just to keep all of the data that you think you're ever going to need in the relational database. A database was designed, a relational data base  specifically for data that's changing. And there's no better approach for concurrency with hundreds or even thousands of users inside an organization and outside an organization. And making sure you've got all of the capability for transactional integrity whether that's two-phase commit or row-level locking. That's exactly what the database was designed for.

However, when it comes to data that is not changing -- transaction data or event data. You really don't need the overhead or the complexity that a relational database supplies. And so one other alternative is to keep all the data that's changing in the relational database but for data that's not changing, put that in a lower- cost, highly-scaleable location, a simple flat file, but do not give up the immediate and precise access that you get with the database. And that's what Copper Eye is finding that our customers are pleased with as a new innovation in managing this vast volume of data without giving up very precise and immediate access to the data.

GT: So it seems your live archive solution creates multiple tiers. And how can you provide the advantage of multiple tiers without increasing complexity?

KM: You know, I think back over the decades that I've been around the technology business and there's always been one constant and that is, with the power of a solution, comes complexity. And in this particular case, this goes back to a capability that preceded the relational database, the simple flat file system. And, so all CopperEye is doing is taking that flat file system and giving you the best attributes of that, which is the low cost and very high scalability with the simplicity of managing that environment. And then we borrow one innovation from the world of databases, which is indexing.

And when you apply innovative indexing, which is the core technology foundation for CopperEye products, to a flat file  -- what you get are the best attributes of both.

GT: That's very interesting. It's almost like you're cherry- picking through the IT stack.

KM: That's a good way to put it!

GT: Okay. Kate, do you have any case studies of particular companies that established solutions using Coppereye?

KM:  Yes, one of our first customers, Orange, one of the large wireless providers in the UK. We've been working with them now for a little more than seven years where they are tracking what they call "anomalous calling patterns." And there could be a number of reasons why they themselves are interested in that data - to be sure that they are getting the revenue that they deserve from people using the network.  And more recently, for regulatory compliance with the EU, that's the European Union, meeting data retention and retrieval requirements. This is really focused on counterterrorism and other criminal activity.

They found that storing all of that data, every call detail record from every cell phone call that was made is just -- you cannot justify putting that in your relational database. For the last seven years, they've been working with CopperEye storing that data in a flat file rather than the relational database.  And, in fact, we added up the other day that we've handled 500 billion transactions for them over that period of time.

They went from storing these call detail records in a relational database where they could only afford to keep ten percent of them for two days, to then storing them in a simple flat file with Coppereye, where they keep 100 percent for 40 days. And just to put that in perspective, to meet the new EU mandates, rather than keeping all this data for 40 days, the guideline is to keep it for a minimum of a year and some EU member countries, like Ireland are saying that data needs to be kept for three years.

So you can see the huge increase in the volume of data and this simply cannot justify the cost of keeping that in a relational database.

GT: Right. That's on the compliance front. It's almost unavoidable. But what has some -- what business advantages have extending the archiving period conferred on other businesses you deal with?

KM: Well, you know, it's interesting. One of the things we haven't really brought up, I'm wondering if we should. This approach is not for text or word documents or web pages.

GT: Right.

KM: That's what traditional search vendors do and it's based on vocabulary-based search. CopperEye is focused on data that would otherwise live in the database -- scalar data. And, for instance, there could be billions of combinations for a particular key like a credit card number or customer number or transaction ID, that's not vocabulary-based. So that's one distinction that's pretty important to make.

GT: Right. Even it has to run through XML? Even when the data, the text is XMLized and then made available through a Web service?

KM: Yes, if it's in XML, we are just actually extending the product to be able to natively support XML. But for now -- if all you're doing is searching for e-discovery, and you’re looking for documents or Web pages, there are other tools that are far better suited for that. They are optimized for that.

GT: Right. I think NetManage comes to mind…

KM: Yes. Companies like Endeca and Fast and Google and Yahoo, those are the companies that have text-based or vocabulary-based search tools, and they pre-build all the indexes based on every important word or relevant word in any particular language. And that's how they index and then find those words. That's not possible to do at huge volumes if you've got 15 billion combinations of transaction ID or customer ID based on combinations of alphanumeric characters. So that's one distinction that we make right up front with the prospect.

GT: You explain the tiering between the flat file and the database and cherry-picking there. How does that improve strategies for infrastructure simplification?

KM: What I've verified with an analyst that I highly respect -- Steve Duplessie of the Enterprise Strategy Group who comes out of the storage world, that historically you’ve had this tier I, tier II, tier III, tier IV storage where you put the most important information on a database on your most expensive storage and then, as the data ages, (that's what is usually used to determine the value - is the age of the data which isn't always a good metric.) customers start moving it down to lower and lower cost storage, which is more difficult for retrieval.

What I've proposed to Steve is that, “Does this not do away with worrying about all those various tiers of storage?”  Doesn't this say, “It's either in the production database or it's on the lowestr cost of storage to begin with, with CopperEye being able to retrieve it and doesn't that collapse all of those levels?” And he said, "well, probably, but we would never talk about that yet because that's too big a jump, too big a leap for people to grasp.”

GT: Kate -- can you cite an example where you contributed to a business agility and/or customer service?

KM: Actually, we have a customer called Message Labs and they provide hosted emails, security, virus and spam checking. They are based in the UK. They've got about 13,000 corporate customers. 35 million email accounts and they see about 8 billion emails a month. And one of the challenges they were having with the log in data was the tracking of these emails across their twelve data centers around the world. And they had a 24-hour service level agreement in place with these corporate customers that if you hadn't gotten an email with an important attachment, a contract or something, Message Labs needed 24 hours to locate the email and notify them. As part of the search process, their network help desk would notify their operations people. Their operations people would actually be trolling through these log files trying to find literally that needle in the haystack; that one email out of the billions that they handle in any particular month.

And they were finding that customers were saying 24-hour turnaround was simply much too long. So what Message Labs did is they implemented CopperEye and rather than having network people and operations center people tracking this data, CopperEye provides direct access to these log files. Message Labs has indexed every one of the emails and what they have now done is set up a customer self-service portal. So within a few seconds, the customer goes online. He enters either the recipient or the sender or the subject, and within a few seconds gets back information on the email, it's quarantined over here for this reason. If it's from a trusted individual, you can release it. And so that has dramatically improved their customer service while, interestingly, lowering their costs. So that's one of those situations I think you can classify as a win-win.

GT: By all means. Those are the kind that remain telling and are the most value to our users. And on that note, I wanted to say that we hope to check back with you in a few months and have some additional case studies and developments, and thank you for taking the time out of a busy schedule for this podcast.

KM: Hey, it was my pleasure, Gian.

GT: Okay! Kate, are there any sites where folks can learn more about the solutions you described?

KM: The best site to go to for now is www.coppereye.com.

GT: Okay! We'll point our listeners there. And I'd like to note again for our listeners, if you're hearing this podcast and would like to engage Kate with follow-up questions, the address as always is www.ebizq.net/firstlook. You'll also find dozens of cutting-edge virtual conferences, Webinars, podcasts, white papers and news that inform, empower and entertain at ebizQ.net. This is ebizQ producer Gian Trotta, thanking you for your time and signing off. 

Print this article    Email this article    Talk Back!    Write to Editor
PepsiAmericas: Realizing Real-Time Communication
a refreshing approach to ESB and data integration

Date: May 28, 2008
Time: 13:00 PM ET
(17:00 GMT)

REGISTER TODAY!
Accelerate Agility and Lower Costs by Virtualizing and Governing Your SOA
Date: May 29, 2008
Time: 12:00 PM ET
(16:00 GMT)

REGISTER TODAY!
Archived Webinars | Upcoming Webinars
Subscribe to our Newsletters
ebizQ Weekly Gold Club Update
Live Webinar Updates
Updates from ebizQ Partners
ebizQ SOA Update
ebizQ BPM Update
ebizQ Security Update
ebizQ BI Update
ebizQ Open Source Software Update
Virtual Show Newsletter
ebizQ Web 2.0 and the Enterprise
Your E-mail Address:

Learning Tools on Enterprise Technology

Quick Guide: What is Event Processing? Learn More

Quick Guide: What is Web 2.0? Learn More

Marketing Solutions | Feedback | About ebizQ | Unsubscribe | Privacy Policy | Site Map