Well, now that we’ve released the Yahoo! Mail Web Service, I suppose we have some explaining to do. In particular, I’d love to talk from time to time about the choices we made, why we made them, things we’ve learned and so on. For this first installment, I figured I’d do a discussion of our two endpoints: SOAP and JSON-RPC.
SOAP
I’m sure most of you will have your opinions of SOAP. Be that as it may, some people actually like it. You can’t deny that certain languages make it dead easy to consume SOAP web services. You feed it your WSDL and it poops out a wad of code for you to call the service as though it were a local call. If only it were that easy. In the end, you may end up dealing with some interoperability problems. During Open Hack Day, we made v1 of the web service available to users. When we released on Wednesday, we were at v1.1. Why? Because we discovered an interoperability issue in the WSDL that made it basically impossible to consume the WSDL using Visual Studio .NET or Axis2. In fixing it, the XML generated by the service was tweaked just enough that it was no longer compatible with the v1 output.
So, while SOAP is supposed to make your life easy, it’s not perfect.
JSON-RPC
We added SOAP first because we had to get Oddpost up and running on the Yahoo! Mail backend. Oddpost ran using SOAP, so we figured the fastest way to get them up and running was to come as close as possible to duplicating the existing service. Once we got rolling however, we realized that it wouldn’t be difficult to add new endpoint types as long as they were close enough in behavior to SOAP that we could map things easily.
JSON-RPC isn’t all that widely used, but it’s semantics are similar enough to SOAP that it was viable. JSON-RPC is a JSON object consisting of three properties: a method name, a list of parameters and an (optional) ID. Maps pretty well with the WSDL. In addition, browsers love to consume it. A single method call (to either eval() or a strict JSON parser) and you can turn the string into a native JavaScript Object. Handy. No more futzing with the cumbersome DOM.
But beyond simplicity, JSON-RPC has another benefit. A benefit that becomes much more evident when you have to scale to a quarter of a billion users: efficiency. It may seem clear to some that parsing/generating JSON is more efficient than XML, but here’s some numbers that make things a little clearer.
| Method Name | JSON-RPC Size | SOAP Size | % Difference |
|---|---|---|---|
| ListFolders | 929 bytes | 1,189 bytes | +28% |
| GetMessage | 9,926 bytes | 10,339 bytes | +4% |
| ListMessages | 35,401 bytes | 38,102 bytes | +7% |
JSON-RPC always beats SOAP, in the case of ListFolders (a very frequently called method) it beats SOAP pretty badly. Even in the other cases where the payloads are larger and SOAP overcomes the penalty of having to transmit an envelope and namespaces, JSON-RPC still beats it by a decent margin (especially if you’re looking at things from the context of the server where we’re serving requests for the quarter of a billion users mentioned earlier).
Say each user makes one request in a given day to each of the 3 methods mentioned above. The total combined difference between JSON-RPC and SOAP for a single user is only 3,374 bytes. Extend that out to a quarter of a billion users and you’re now talking 785 gigabytes. That’s close to a terabyte of data transfer saved doing nothing more than choosing a more efficient object encoding. Of course, it’s not entirely accurate because I haven’t accounted for HTTP compression, but you get the idea.
Set aside the network cost and also consider the parsing cost. In most cases you’re going to hold those bytes in memory while you’re parsing them or you’ll buffer them in memory after rendering them. More bytes in the object representation means more bytes in local memory that you’ll suck down just in handling the serialized objects.
I find it very strange that you’re discussing the relative merits of these different formats while not addressing the more fundamental mismatch between the interface and the web. Essentially RPC eliminates several key benefits of using HTTP.
When talking of efficiency, are the responses of your getter methods transparently cacheable on the web? To understand an basic error message do you still have to parse the body of the message? If you decide to make a minor change to the method signature of one of your endpoints, would that break all existing clients? At a simple pragmatic level, can I point a web browser at the interface and see something useful? Can I post data into the system from a HTML form?
If the answer to any of these questions is no, I suggest checking out Paul Downey’s Good Web APIs are just Web Sites and Mark Baker’s Protocol Independence diagram.
So, to digest your question a bit, why didn’t we do REST? The short answer is in the description above: we had to support Oddpost. Additionally, we had to create some optimizations, namely in the form of the BatchExecute call. Call batching doesn’t fit well into the REST model. We could have tried to really force things into GETs to get something RESTish, but at the end of the day it would have been some perverse, difficult to use and understand API.
To answer your other questions:
* No, the responses of our methods aren’t very cacheable. To start out with, it’s personal data. We have a quarter of a billion users. Unless you have a cache set up for a pretty small subset, your hit ratio is going to be pretty miserable. Additionally, given our inbound delivery rates, your mailbox is probably updating quite frequently, making message bodies one of the few pieces of data that aren’t constantly updating. The actual benefit of caching is probably pretty low.
* As for error messages, it depends on how basic you’re talking. You can easily distinguish success from error by checking for HTTP 200 vs. !200. If you want more detail than that, then you have to parse the response. The response will indicate who’s fault it was (yours or mine) and what the underlying error code is.
* Additions of optional parameters to the methods won’t break existing clients. Every method takes exactly one parameter. The parameter has numerous properties that indicate some facet of the execution of the method.
* No, you can’t point a browser at the service and see something useful (unless you consider a “hey, where’s my parameters” message useful).
I’m fully aware of the benefits of REST and sticking to HTTP as a protocol instead of using it merely as a transport. There were some conscious decisions that sent us down this road. Realize that this is the service we had to build internally to support the Oddpost acquisition, so we weren’t exactly building from scratch.
I would love to see the community reach out and collectively tell us that they want REST. If I can get that kind of weight behind such a request, then it makes my job of selling it internally that much easier. I’d love to see us make a REST offering available.
[...] ryan kennedy’s blog » JSON-RPC vs. SOAP Yahoo implements JSON-RPC for mail. (tags: JSON-RPC) [...]
Hi Ryan Thanks for leaving a message on my blog. Ignore my ignorance but could you share some example of accessing yahoo mail API purely in javascript without involving any backend scripting language?
Thanks
Hi Ryan,
This post was very informative. Enjoyed reading it. Is it possible to expose the same method in SOAP, JSON-RPC and XML-RPC. I develop in .NET where you decorate the method with [WebMethod] to expose it in a web service. Is it possible to expose the same web service and the methods inside in SOAP, JSON-RPC and XML-RPC. If it is not possible should the methods be duplicated three times to be exposed in SOAP, JSON-RPC and XML-RPC?
Thank You,
Vish
http://viswaug.wordpress.com
Vish, yes…it’s entirely possible to do without duplicating any code. I had started on an XML-RPC implementation at one point, but backed away since (at the time) there was no demand for it.