Well, now that we’ve released the Yahoo! Mail Web Service, I suppose we have some explaining to do. In particular, I’d love to talk from time to time about the choices we made, why we made them, things we’ve learned and so on. For this first installment, I figured I’d do a discussion of our two endpoints: SOAP and JSON-RPC.
SOAP
I’m sure most of you will have your opinions of SOAP. Be that as it may, some people actually like it. You can’t deny that certain languages make it dead easy to consume SOAP web services. You feed it your WSDL and it poops out a wad of code for you to call the service as though it were a local call. If only it were that easy. In the end, you may end up dealing with some interoperability problems. During Open Hack Day, we made v1 of the web service available to users. When we released on Wednesday, we were at v1.1. Why? Because we discovered an interoperability issue in the WSDL that made it basically impossible to consume the WSDL using Visual Studio .NET or Axis2. In fixing it, the XML generated by the service was tweaked just enough that it was no longer compatible with the v1 output.
So, while SOAP is supposed to make your life easy, it’s not perfect.
JSON-RPC
We added SOAP first because we had to get Oddpost up and running on the Yahoo! Mail backend. Oddpost ran using SOAP, so we figured the fastest way to get them up and running was to come as close as possible to duplicating the existing service. Once we got rolling however, we realized that it wouldn’t be difficult to add new endpoint types as long as they were close enough in behavior to SOAP that we could map things easily.
JSON-RPC isn’t all that widely used, but it’s semantics are similar enough to SOAP that it was viable. JSON-RPC is a JSON object consisting of three properties: a method name, a list of parameters and an (optional) ID. Maps pretty well with the WSDL. In addition, browsers love to consume it. A single method call (to either eval() or a strict JSON parser) and you can turn the string into a native JavaScript Object. Handy. No more futzing with the cumbersome DOM.
But beyond simplicity, JSON-RPC has another benefit. A benefit that becomes much more evident when you have to scale to a quarter of a billion users: efficiency. It may seem clear to some that parsing/generating JSON is more efficient than XML, but here’s some numbers that make things a little clearer.
| Method Name |
JSON-RPC Size |
SOAP Size |
% Difference |
| ListFolders |
929 bytes |
1,189 bytes |
+28% |
| GetMessage |
9,926 bytes |
10,339 bytes |
+4% |
| ListMessages |
35,401 bytes |
38,102 bytes |
+7% |
JSON-RPC always beats SOAP, in the case of ListFolders (a very frequently called method) it beats SOAP pretty badly. Even in the other cases where the payloads are larger and SOAP overcomes the penalty of having to transmit an envelope and namespaces, JSON-RPC still beats it by a decent margin (especially if you’re looking at things from the context of the server where we’re serving requests for the quarter of a billion users mentioned earlier).
Say each user makes one request in a given day to each of the 3 methods mentioned above. The total combined difference between JSON-RPC and SOAP for a single user is only 3,374 bytes. Extend that out to a quarter of a billion users and you’re now talking 785 gigabytes. That’s close to a terabyte of data transfer saved doing nothing more than choosing a more efficient object encoding. Of course, it’s not entirely accurate because I haven’t accounted for HTTP compression, but you get the idea.
Set aside the network cost and also consider the parsing cost. In most cases you’re going to hold those bytes in memory while you’re parsing them or you’ll buffer them in memory after rendering them. More bytes in the object representation means more bytes in local memory that you’ll suck down just in handling the serialized objects.