Gian Trotta: Welcome
to another "First Look" podcast. I'm your host, ebizQ's Gian Trotta. A
recent ebizQ Webinar guest estimated that by the year 2010, the world's
data volume will reach one zetabyte. That's 1 followed by 21 zeros. And
companies that seek to mine and refine this for business or compliance
solutions will face a daunting and expensive task. With us today is
CopperEye CEO Kate Mitchell to define the problem and delineate some
solutions.
GT: Kate,
what IT complications and additional costs are associated with this
explosion of data?
KM: Part of
the challenge is being able to find the exact data that you're looking
for as the data volumes are growing so dramatically. As companies have
been continuing to increase the size of the database they are adding
larger hardware platforms so that performance remains constant, both on
the transaction side and the analytical side. But the cost of that
hardware and the footprint in the data center, along with the power
requirements, are getting to the point where it’s no longer
possible to continue just to do business the old-fashioned way. And
furthermore, when databases get to the multiple terabyte size, the
number of highly skilled DBAs that you need is another major challenge
-- finding those people as the data volumes are growing so dramatically
is a challenge.
GT: Kate,
can investing in traditional solutions keep pace with the data
explosion?
KM: In some
industries, that is the case. But actually, in others -- especially if
there is a business discontinuity like the advent of 3G in wireless
telecommunications or RFID whether you look at it from the supplier's
side or the retailer's side. You can no longer just throw money at the
problem. The explosion in data is so dramatic. Furthermore -- something
that all industries face is regulatory issues where, in many cases,
you're being required to keep a lot more data for a lot longer, and on
top of that, you're being asked to provide very specific responses to
inquires quickly. It's no longer, a 30-day response or a one-week
response. In many cases, it's 24 hours or less.
GT: Right.
So do you see compliance as a major driver -- or the need to establish
some kind of business agility and advantage?
KM: It
depends again on the industry. Many industries have both of theose
pressures where they're trying to improve their service levels. They're
trying to find a way to be more competitive with a focus on improving
business operations, which is the primary driver. And then in other
industries, it's the regulatory and compliance pressures that are
primary in terms of trying to find a way to manage this explosion of
data, without giving up the access to the very specific data that
you're looking for.
GT: Kate,
what is the experience of particular industries of verticals in this
regard?
KM: We've
spent a lot of time at CopperEye focused on the telecommunications
sector and they certainly are heavily regulated and specifically have a
need for, in this case, aiding to combat terrorism - to find very
specific information on individuals that are potential criminal
suspects. To be able to find calls they made, when they made them,
where they were, who they called, how often they called. That's on the
regulatory side.
On the business side, you have things like 3G -- third generation
wireless impacting the amount of data that they are dealing with. In
many cases, the estimates are 20 to 30 times the volume of data to
handle all the unique services on top of what cell phones are now being
used for such as audio and video and online shopping, and so forth. So,
I think that's an example of an industry, which is really feeling the
pinch from both sides. Financial services would be quite similar.
GT: Those
are good examples…
KM: You know
what's interesting; it's not just the wireless providers. This is also
a requirement for the Internet service providers - having to keep,
every Web site visited along with the IP addresses. For the EU, that
doesn't come fully into play until 2009. It's the telecommunication
providers that need to have their plans in place by September of this
year. But it’s going to be far more complex when we talk
about IP addresses that have to be tracked for months or years.
GT: You
know, ISP and wireless carries -- how much price pressure is this
putting on them? That will in turn be passed on to my end? Do you think
CopperEye can really help offset this price pressure?
KM: We
generally are at least 75 percent less costly, all things considered,
hardware and personnel and software, than a relational data base
approach. And we found in the first couple of years that we were
dealing with the wireless providers, that was a reasonable ROI to get
us in the door. What we have found more recently, just in the last six
months or so, is that the big companies are saying to us -- "it's not
even a consideration for me because of the volume, to use a relational
database. To get the performance that I would need, the cost is
prohibitive."
And more importantly now, the time to implement because of the deadline
coming up, the complexity of a relational database is a major issue --
we know of one particular case with a very large wireless provider who
has been working for more than a year with Oracle. You've got the
ongoing software costs; you've got the hardware costs. And on top of
that it's been more than a year of professional services to try to make
this work properly. That doesn't fly these days. And especially if
there is an alternative, even if it's from a company that's not well
known as long as we can prove it in a few days, even with a few billion
rows of data and we can get a customer in production in a few weeks.
That just changes the economics so dramatically. The old-fashioned way
of doing things -- of throwing money at it, is not even, in many cases,
a consideration -- so. That's good news for us.
GT: It is.
And for everyone, as in life…
KM: That's
right!
GT:
…It's not enough to be rich. It helps to be smart as well as
rich.
KM: That's
right!
GT: Okay.
How does a company like yours design a solution that can scale with
this data explosion?
KM: The
traditional way is just to keep all of the data that you think you're
ever going to need in the relational database. A database was designed,
a relational data base specifically for data that's changing.
And there's no better approach for concurrency with hundreds or even
thousands of users inside an organization and outside an organization.
And making sure you've got all of the capability for transactional
integrity whether that's two-phase commit or row-level locking. That's
exactly what the database was designed for.
However, when it comes to data that is not changing -- transaction data
or event data. You really don't need the overhead or the complexity
that a relational database supplies. And so one other alternative is to
keep all the data that's changing in the relational database but for
data that's not changing, put that in a lower- cost, highly-scaleable
location, a simple flat file, but do not give up the immediate and
precise access that you get with the database. And that's what Copper
Eye is finding that our customers are pleased with as a new innovation
in managing this vast volume of data without giving up very precise and
immediate access to the data.
GT: So it
seems your live archive solution creates multiple tiers. And how can
you provide the advantage of multiple tiers without increasing
complexity?
KM: You
know, I think back over the decades that I've been around the
technology business and there's always been one constant and that is,
with the power of a solution, comes complexity. And in this particular
case, this goes back to a capability that preceded the relational
database, the simple flat file system. And, so all CopperEye is doing
is taking that flat file system and giving you the best attributes of
that, which is the low cost and very high scalability with the
simplicity of managing that environment. And then we borrow one
innovation from the world of databases, which is indexing.
And when you apply innovative indexing, which is the core technology
foundation for CopperEye products, to a flat file -- what you
get are the best attributes of both.
GT: That's
very interesting. It's almost like you're cherry- picking through the
IT stack.
KM: That's
a good way to put it!
GT: Okay.
Kate, do you have any case studies of particular companies that
established solutions using Coppereye?
KM:
Yes, one of our first customers, Orange, one of the large wireless
providers in the UK. We've been working with them now for a little more
than seven years where they are tracking what they call "anomalous
calling patterns." And there could be a number of reasons why they
themselves are interested in that data - to be sure that they are
getting the revenue that they deserve from people using the
network. And more recently, for regulatory compliance with
the EU, that's the European Union, meeting data retention and retrieval
requirements. This is really focused on counterterrorism and other
criminal activity.
They found that storing all of that data, every call detail record from
every cell phone call that was made is just -- you cannot justify
putting that in your relational database. For the last seven years,
they've been working with CopperEye storing that data in a flat file
rather than the relational database. And, in fact, we added
up the other day that we've handled 500 billion transactions for them
over that period of time.
They went from storing these call detail records in a relational
database where they could only afford to keep ten percent of them for
two days, to then storing them in a simple flat file with Coppereye,
where they keep 100 percent for 40 days. And just to put that in
perspective, to meet the new EU mandates, rather than keeping all this
data for 40 days, the guideline is to keep it for a minimum of a year
and some EU member countries, like Ireland are saying that data needs
to be kept for three years.
So you can see the huge increase in the volume of data and this simply
cannot justify the cost of keeping that in a relational database.
GT: Right.
That's on the compliance front. It's almost unavoidable. But what has
some -- what business advantages have extending the archiving period
conferred on other businesses you deal with?
KM: Well,
you know, it's interesting. One of the things we haven't really brought
up, I'm wondering if we should. This approach is not for text or word
documents or web pages.
GT: Right.
KM: That's
what traditional search vendors do and it's based on vocabulary-based
search. CopperEye is focused on data that would otherwise live in the
database -- scalar data. And, for instance, there could be billions of
combinations for a particular key like a credit card number or customer
number or transaction ID, that's not vocabulary-based. So that's one
distinction that's pretty important to make.
GT: Right.
Even it has to run through XML? Even when the data, the text is XMLized
and then made available through a Web service?
KM: Yes, if
it's in XML, we are just actually extending the product to be able to
natively support XML. But for now -- if all you're doing is searching
for e-discovery, and you’re looking for documents or Web
pages, there are other tools that are far better suited for that. They
are optimized for that.
GT: Right.
I think NetManage comes to mind…
KM: Yes.
Companies like Endeca and Fast and Google and Yahoo, those are the
companies that have text-based or vocabulary-based search tools, and
they pre-build all the indexes based on every important word or
relevant word in any particular language. And that's how they index and
then find those words. That's not possible to do at huge volumes if
you've got 15 billion combinations of transaction ID or customer ID
based on combinations of alphanumeric characters. So that's one
distinction that we make right up front with the prospect.
GT: You
explain the tiering between the flat file and the database and
cherry-picking there. How does that improve strategies for
infrastructure simplification?
KM: What
I've verified with an analyst that I highly respect -- Steve Duplessie
of the Enterprise Strategy Group who comes out of the storage world,
that historically you’ve had this tier I, tier II, tier III,
tier IV storage where you put the most important information on a
database on your most expensive storage and then, as the data ages,
(that's what is usually used to determine the value - is the age of the
data which isn't always a good metric.) customers start moving it down
to lower and lower cost storage, which is more difficult for retrieval.
What I've proposed to Steve is that, “Does this not do away
with worrying about all those various tiers of
storage?” Doesn't this say, “It's either
in the production database or it's on the lowestr cost of storage to
begin with, with CopperEye being able to retrieve it and doesn't that
collapse all of those levels?” And he said, "well, probably,
but we would never talk about that yet because that's too big a jump,
too big a leap for people to grasp.”
GT: Kate --
can you cite an example where you contributed to a business agility
and/or customer service?
KM:
Actually, we have a customer called Message Labs and they provide
hosted emails, security, virus and spam checking. They are based in the
UK. They've got about 13,000 corporate customers. 35 million email
accounts and they see about 8 billion emails a month. And one of the
challenges they were having with the log in data was the tracking of
these emails across their twelve data centers around the world. And
they had a 24-hour service level agreement in place with these
corporate customers that if you hadn't gotten an email with an
important attachment, a contract or something, Message Labs needed 24
hours to locate the email and notify them. As part of the search
process, their network help desk would notify their operations people.
Their operations people would actually be trolling through these log
files trying to find literally that needle in the haystack; that one
email out of the billions that they handle in any particular month.
And they were finding that customers were saying 24-hour turnaround was
simply much too long. So what Message Labs did is they implemented
CopperEye and rather than having network people and operations center
people tracking this data, CopperEye provides direct access to these
log files. Message Labs has indexed every one of the emails and what
they have now done is set up a customer self-service portal. So within
a few seconds, the customer goes online. He enters either the recipient
or the sender or the subject, and within a few seconds gets back
information on the email, it's quarantined over here for this reason.
If it's from a trusted individual, you can release it. And so that has
dramatically improved their customer service while, interestingly,
lowering their costs. So that's one of those situations I think you can
classify as a win-win.
GT: By all
means. Those are the kind that remain telling and are the most value to
our users. And on that note, I wanted to say that we hope to check back
with you in a few months and have some additional case studies and
developments, and thank you for taking the time out of a busy schedule
for this podcast.
KM: Hey, it
was my pleasure, Gian.
GT: Okay!
Kate, are there any sites where folks can learn more about the
solutions you described?
GT: Okay!
We'll point our listeners there. And I'd like to note again for our
listeners, if you're hearing this podcast and would like to engage Kate
with follow-up questions, the address as always is
www.ebizq.net/firstlook. You'll also find dozens of cutting-edge
virtual conferences, Webinars, podcasts, white papers and news that
inform, empower and entertain at ebizQ.net. This is ebizQ producer Gian
Trotta, thanking you for your time and signing off.