We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Business-Driven Architect

Brenda Michelson

ACM Queue: Interview with Amazon's Werner Vogels on Services, SOA, and Business Innovation

Vote 0 Votes

The May issue of ACM Queue has a great interview of Amazon's CTO Werner Vogels, by Microsoft Technical Fellow Jim Gray. In the interview, they discuss Amazon's technology transformation from monolithic web applications to a service-oriented architecture. This isn't just a technology transformation story, but also a business one. As Amazon shifted their application architecture to a services approach, they were able to shift from being an on-line retailer, to a business services and technology platform company - that just happens to have an on-line retail business with 55 million customers.

In the interview, they cover a lot of ground: why SOA?, the resulting solutions, lessons learned, use of standards, use of tools, developers as artists, architects as creative thinkers, business strategy alignment, innovation, and relentless customer focus.

In this post, I want to call-out four topics (and corresponding conversation) that I found especially interesting. The full article is available (free) here.

Why SOA? Amazon realized their traditional web applications and database access/scaling approaches were constraining business growth. Not just in terms of customers, products and categories, but also in terms of new business offerings.

WV Growth is core to Amazon.com's business strategy, and that has had a significant impact on the way we use technology: growth through more categories, a larger selection, more services, more buying customers, more sellers, more merchants, more developers, increasing the different access methods, and expanding delivery mechanisms. The impact has been on many areas: larger data sets, faster update rates, more requests, more services, tighter SLAs (service-level agreements), more failures, more latency challenges, more service interdependencies, more developers, more documentation, more programs, more servers, more networks, more data centers. A large part of Amazon.com's technology evolution has been driven to enable this continuing growth, to be ultra-scalable while maintaining availability and performance.

Amazon.com started 10 years ago as a monolithic application, running on a Web server, talking to a database on the back end. This application, dubbed Obidos, evolved to hold all the business logic, all the display logic, and all the functionality that Amazon eventually became famous for: similarities, recommendations, Listmania, reviews, etc. For years the scaling efforts at Amazon were focused on making the back-end databases scale to hold more items, more customers, more orders, and to support multiple international sites. This went on until 2001 when it became clear that the front-end application couldn't scale anymore.

JG Was that performance scalability, or was it manageability, or facilities?

WV The many things that you would like to see happening in a good software environment couldn't be done anymore; there were many complex pieces of software combined into a single system. It couldn't evolve anymore. The parts that needed to scale independently were tied into sharing resources with other unknown code paths. There was no isolation and, as a result, no clear ownership.

At the same time, there was continued difficulty in the back-end database scaling effort. Databases--and by that time we were using several databases--were a shared resource, which made it very hard to scale-out the overall business. So both the front-end and back-end processes were restricted in their evolution because they were shared by many different teams and processes.

We went through a period of serious introspection and concluded that a service-oriented architecture would give us the level of isolation that would allow us to build many software components rapidly and independently. By the way, this was way before service-oriented was a buzzword.

Simple Roots, Complex Applications. Amazon has a simple approach to services, which allows them to recombine those services in a variety of applications and business offerings. Those applications and business offerings include: Amazon's retail customer website, Amazon's Seller Marketplace, AWS, and an e-commerce platform for the creation of independent e-commerce sites, such as Target and Sears Canada.

WV For us service orientation means encapsulating the data with the business logic that operates on the data, with the only access through a published service interface. No direct database access is allowed from outside the service, and there's no data sharing among the services.

Over time, this grew into hundreds of services and a number of application servers that aggregate the information from the services...

...The big architectural change that Amazon went through in the past five years was to move from a two-tier monolith to a fully-distributed, decentralized, services platform serving many different applications. A lot of innovation was necessary to make this happen, as we were one of the first to take this approach. Operating such a diverse set of services at this scale is not something that many people have done before, especially not with the kind of isolation that we wanted to achieve.

It has been a major learning experience, but we have now reached a point where it has become one of our main strategic advantages. We can now build very complex applications out of primitive services that are by themselves relatively simple. We can scale our operation independently, maintain unparalleled system availability, and introduce new services quickly without the need for massive reconfiguration.

Lessons Learned. So often, we think of services and performance degradation. I found Amazon's story compelling regarding a different take on that point.

WV The first and foremost lesson is a meta-lesson: If applied, strict service orientation is an excellent technique to achieve isolation; you come to a level of ownership and control that was not seen before. A second lesson is probably that by prohibiting direct database access by clients, you can make scaling and reliability improvements to your service state without involving your clients. Other lessons are related to how you access services: If you want to be able to aggregate services easily, if you want to insert advanced infrastructure techniques such as decentralized request routing or distributed request tracking, you need a single unified service-access mechanism.

In addition, Werner offers insights on development impact. The "You build it, you run it" is worth thinking about.

WV Another lesson we've learned is that it's not only the technology side that was improved by using services. The development and operational process has greatly benefited from it as well. The services model has been a key enabler in creating teams that can innovate quickly with a strong customer focus. Each service has a team associated with it, and that team is completely responsible for the service--from scoping out the functionality, to architecting it, to building it, and operating it.

There is another lesson here: Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.

Later in the interview, Werner speaks to the importance of testing. I had a conversation with Solstice a couple of weeks ago on this very topic.

WV One of the biggest challenges is developing at this large scale. How do you make sure that developers are productive in this large distributed service-oriented architecture? If you have this decentralized organization where everybody is developing things in parallel, how can you make sure that all the pieces work together as intended, now and in the future? An important part of that is, for example, testing. How do you test in an environment like Amazon? Do we build another Amazon.test somewhere, which has the same number of machines, the same number of data centers, the same number of customers, and the same data sets?

How do I test against this pair of jeans with that size, which just happens to be out of stock but will be back in stock two days from now, such that the customer e-mail interaction can be tested correctly? How do I test that with the new version of the offer-services that will roll out next week on the Japan Web site but with a browser that is not complete in displaying the Kanji characters?

These are very hard questions, and we have no simple answers. Testing in a very large-scale distributed setting is a major challenge.

Innovation. I think this point is critical. Service-orientation (loose coupling, mix and match, non-invasive changes) provides a platform (ecosystem) for experimentation and business innovation.

WV If an idea is deemed worthy of investigation, we exploit our service development approach to scope and prototype the idea quickly. With a new radical service, you try to go into prototype mode pretty quickly, and then you start iterating on that until you feel that you understand your business problem. The small-team concept means that you have a continuous feedback loop where you try to understand the impact for the customer.

That's in general how requirements are being refined, with the customer in the loop. It is also very important to try to determine at the outset what the success criteria should be, and how they can be measured.

This fast response to new ideas is enabled through the loosely coupled services model, both in technology and at the developer and operations level. From the outside, the services in our platform may appear chaotic, but chaotic in a good sense--in that we try not to impose a rigid structure on the different functional pieces, but we expect there to be order when looking at it from a different dimension. Thinking about this whole system as a big deterministic system would be unrealistic. Life is not deterministic, and a large-scale distributed system such as Amazon has many organic and emerging properties that can come to life only if you do not constrain it.


This is a must read blog post and acmq article for every developer and architect.

It's a very good insight into what Amazon has done. I guess what strikes me is the maturity level of the organization. I not sure it's representative of the typical IT organization, okay we know it's probably not.

The challenge for many IT organizations is how to get to that point. A fully distributed architecture that is highly componentized, interoperates, developers with ownership all sounds great. It's a bit of the nirvana of computing and SOA as whole. The question is how do you get your organization to that level without making a bigger mess than you already have?

I do think one of the critical issues is governance. Without that strong governance in place (which they seem to have), that architecture can become very brittle and very difficult to manage. Just drawing from my own experience with EAI platforms, its very easy for those services to take on a life of their own if not properly govern. Once that happens, change becomes difficult and agility along with reuse is lost.

Kudos to Amazon for setting the bar. It's good to see what the potential for Service Orientation can be in a real company.

Brenda Michelson, Principal of Elemental Links, shares her view on architectural strategies, technology trends, business, and relevance.

Brenda Michelson

Brenda Michelson is the principal of Elemental Links an advisory & consulting practice focused on business-technology capabilities that increase business visibility and responsiveness. Follow Brenda on Twitter.


BDA Feed
BDA Comments Feed

Enter your email address:

Delivered by FeedBurner

Recently Commented On

Monthly Archives