Archive for the ‘Development’ Category

SimplePie is neither simple, nor pie

Saturday, August 5th, 2006

As I mentioned before, Bloglines sucks. So tonight I was trying to build my own aggregator in an attempt to free myself. As it turns out, freedom can be quite painful. My first stop was to get the basics working. I whipped up a quick OPML parser so I could import my feeds from Bloglines. Once I had that done, I needed a way to fetch the feeds so I could display them. I didn’t care so much about read/unread state (I’ll handle that later), I just wanted to get the very basic guts of a feed reader put together.

So now that I’ve got my list of feeds, I need something to fetch the RSS/ATOM and parse it. I’ve looked at a couple of PHP feed parsers in the past, but decided to give SimplePie a shot. As it turns out, not such a great idea. SimplePie touts their software as “So easy, even your grandmother could.” Sounded awesome, so I installed it. Seemed pretty simple, really. Download the tarball, unzip it, get out a single PHP file, include that in your PHP script and then call their library. It was easy…a little too easy.

I first noticed something was up when I output feed items that had embedded images. None of them were showing up. I figured it must be some anti-hotlinking stuff in action, so I browsed over to my feed (which has no hotlink protection). Same, images still didn’t show up. I viewed the HTML source and noticed all of my image URLs had been URL encoded and prefixed by “?i=”. This seemed really odd, so I double checked to make sure that wasn’t showing up in the feed XML. It wasn’t. For some reason, SimplePie was doing it. So I did some searches on their site. After a bit of stumbling about, I found this page. Interesting…they built in a mechanism to work around hotlink blocking. As I dug further, it turned out I had to create a whole other page to handle all image requests. That page would backpost requests to fetch the images and then serve them back to my feed reader.

So I created my image serving page and made the necessary tweaks to the SimplePie configuration to let it know about this new page. Now I ought to be seeing “images.php?i=” instead of “?i=”. I saved my changes and reloaded my browser. Nothing, the images were still broken. So I once again viewed the source. When I looked at the image tags there was still only “?i=”. No reference at all to images.php. What the hell? So I went back to my source and double checked everything, making sure it looked exactly as it did in the SimplePie documentation. Looked fine. Hmm…maybe my browser cache still has the old version of the page. So I did a Shift-reload to clear the cache. Still nothing. Hmm…cache, oh yeah…SimplePie has a cache. But…they wouldn’t dare cache the content transformations would they? So I opened up the cache file and sure enough, they were caching the image tag transformations. I removed the cache file, reloaded my browser and voila! The images popped up perfectly. So as it turns out, the SimplePie installation documents are missing this all important step for anybody who wants to look at something other than broken image icons.

As I thought about it, I realized I really didn’t want hotlink protection. If people were so uptight about hotlinking that they added protection to their images to prevent it, who was I to ignore their wishes? Additionally, serving the images through images.php meant that it was counting against my bandwidth. I wasn’t really crazy about that notion either. So I searched through the SimplePie documentation trying to find anything that would let me disable the hotlink juju. Nada, evidently you get the hotlink stuff whether you want it or not.

So, in conclusion:

  • simple = good
  • pie = good
  • SimplePie = not good

Up next I’ll check out Magpie. What’s up with using “pie” to name everything? Now I can’t stop thinking about pie.

Copy Messages around in Thunderbird

Monday, July 31st, 2006

Most people won’t care about this tip but hey…most people don’t read my blog either.

Working on mail, I have to shuffle test messages around all the time. That sometimes means fetching them from one account and storing them in another. Thunderbird provides a nice, simple mechanism for doing this. Set up source and destination mail accounts in Thunderbird. The destination account will have to be an IMAP account, you can’t put mail into a mailbox using POP. You can fetch it, however, so the source account can be a POP account if that’s all you’ve got. Once the accounts are set up, find the message you want to copy. Select the message in the message list and drag it to the account and folder you want to copy it to (alternatively, right click on the message and use either “Copy To” or “Move To”). Voila!

If you’re bold and daring, here’s another tip. Say you want to take an existing message and modify the raw contents before copying it to another account. Easy. Find the message you want to modify and right click on it in the message list. You should get a context menu with an option that says “Save As…”. Select this and save the message to your computer. Now open the message with your favorite text editor and have at it. When you’re done, save your changes and go back to Thunderbird. In the “File” menu, select “Open Saved Message…”. Select the file you just modified and it pops up in it’s own window for you to read. Now, right click in the message body and look for “Copy To” in the context menu. Drill down into the account and select the folder you want to copy the message to. Easy as pie.

Obviously Mac users with single button mice will have to adjust the instructions involving right clicking. Get over it, you should be used to it at this point.

More Goodies from OSCON

Saturday, July 29th, 2006

Rasmus isn’t the only person making presentations at OSCON. Andrei (another Yahoo! working on the guts of PHP) did his PHP 6 presentation. One of the major features coming in PHP 6 is the built-in Unicode support. PHP 5 already has Unicode support, but it was never baked into the platform. In PHP 6, Unicode will be tied very strongly into the guts of everything.

That’s good news. One of the best things about Java is knowing that when you’re dealing with a string, you’re always dealing in Unicode. No matter where that string came from and no matter where it’s going, for right now it’s Unicode.

Check out Andrei’s slides from the presentation. My favorites are way down on slides 74 and 75 (I wonder how long his presentation was), the slide about transliteration in PHP 6. The short of it is, you can take a Unicode string from one language and transliterate to a string in another language. The example Andrei gives in the slide transliterates the Japanese (I think) string “たけだ, まさゆき” to one of the Latin character sets, where it comes out as “Takeda, Masayuki”. He goes on to show you how to use transliteration to get a pronunciation of your name in another language. Very cool stuff. Just imagine if your mail reader could take mail sent to you from your Japanese penpal and transliterate their name in the “From” field to something pronouncable in English?

I guess I’m going to have to get a PHP 6 install set up on my laptop so I can play around a little.

Addition: Andrei also mentions Powell’s book store in Portland. I have to agree, take a bit of time and go get lost in that store (literally). You’ll have an awesome time.

Addition #2: The ICU project has a transliteration demo page up. Type some text into “Input”, select a source character set in “Source 1″ (for English, pick “Latin”) and then select the desired output character set in “Target 1″. The result is, you find out how to pronounce the input in another language. For instance, my full name in Cyrillic is “Рыан Цхристопхер Кеннеды”. In Arabic, my full name is “ريَن كهرِستُپهِر كِننِدي”. Hopefully I can actually trust the demo…for all I know it’s spitting out “this guy’s impotent” in another language.

Niall Kennedy covers Rasmus’ PHP Presentation at OSCON

Friday, July 28th, 2006

Aside from having an awesome last name, Niall also has a nice writeup on Rasmus’ PHP presentation at OSCON. Rasmus breaks down taking an application from 17 requests per second to 1700 requests per second. Afterwards, he dives into web services in PHP, both accessing them and publishing them.

The presentation is good, very step by step with a lot of sample code along the way so you can see exactly what Rasmus is talking about. And there’s no quantum leaps, it takes several small iterations before he finally gets the application from 17 to 1700.

There are things in the PHP language that annoy me, however I can’t ignore the fact that when I sit down to write PHP things just get done. There’s no build setup process to get started (assuming you already have a web server running PHP). Just open a file and start typing. There’s no edit-compile-run cycle, it’s just edit-run (unless, of course, you’re writing an extension). It also features some very classy OOP features, such as using the __call() method for overloading. I’ve been using that a lot lately to build some fun stuff.

As if that wasn’t enough, there’s been a recent development sure to make any low-level PHP developer very happy. Sara Golemon (now a Yahoo!) has released her excellent book, Extending and Embedding PHP. For any of you who have cursed the extension documentation on the Zend website, GO BUY THIS BOOK. It’s basically what Zend should have provided since day one. Sara’s done a very nice job explaining the Zend lifecycle and how you can easily hook into it.

Sweet filesystems

Sunday, June 25th, 2006

Scoble’s got a post where he shares his thoughts about why WinFS was yanked recently by MS. I think the minor shit storm going on in the comments is amusing. Reminds me of trying to crawl my way through Slashdot comments (crap, crap, crap, insightful, crap, crap, etc). But listening to all the bitching in the comments did remind me of one thing: that sweet mother of a filesystem that is BeFS.

Everybody I know has one of two reactions when I bring up my love affair with BeOS. It’s either “what’s BeOS?” or “what’s wrong with you?” Little do they know, BeOS had the sweetest filesystem. Here’s the easiest way I can explain why BeFS is so awesome. Every file on your computer has an associated type. It could be an MP3, an email message, a Word document or a video. Each file type has an associated set of metadata that goes along with it. For instance, an MP3 might have an artist, album, song title and duration. An email message might have a sender, recipients, subject, date and whether or not there are any attachments.

I can hear you all getting bored. You’re all saying, “yeah, so what…everything has metadata.” Yes, the only difference is that BeFS made it incredibly easy to store, access and search metadata. The metadata was indexed so you could search it fast, really fast. So imagine you’re on your desktop and you want to search for something. Let’s say you want to look for all of your Bjork songs. Easy as pie, just ask BeFS to find you all files with a metadata field named “Artist” with the value “Bjork”. Even if the filename is ABCD1234.mp3, BeFS is gonna find it. Filenames are irrelevant. Not only is BeFS going to find it, it’ll find it anywhere. You don’t need to have all of your MP3s in one “Music” folder. Because the entire filesystem is indexed, BeFS can find needles in a haystack in the blink of an eye. Want to find all the email sent from your boss? Come on now, at least make it difficult on BeFS. How about all email on the entire computer sent from your boss with the word “bonus” in the subject sent between 12/1/2005 and 12/31/2005? Just like searching for Bjork, BeFS chews it up and spits it out faster than you can say “refresh rate”.

And because the searches are so fast, you can do some fun stuff. You can, for example, create “smart folders” that are really just a familiar way of exposing these filesystem searches as everyday folders (BeOS did that too). The speed of the filesystem search makes it feel just like opening up any other folder. Remember when you had to download someone’s special desktop search application to do this kind of stuff for you? You’d download it and it would spend the next 6 hours crawling your hard disk to build up a search index and when it was done you’d have to launch that application to perform the search and then another application to open the file when you found it. Well, all that was built into BeFS. BeFS was actively indexing all the time. There was no 6 hour long process to index everything. And if an application wanted to search for something, it just asked BeFS to do it.

This is what I miss the most about BeOS, the beautiful filesystem. I miss some other things, but BeFS is what I miss the most. It’s a shame to hear that MS is killing something (WinFS) that sounded an awful lot like BeFS. If it’s true and the two are very much alike, I’m not quite sure how Scoble thinks the web destroys the advantage a system like WinFS provides. If anything, WinFS provides a powerful tool to make the web experience all that much better. Think of all of the web services that are probably implementing metadata in a database by hand right now. Services like flickr with their plethora of photos, just begging to have all of the EXIF data indexed. Services like Yahoo! Mail/Hotmail/GMail all begging to have MIME headers indexed. Services like WordPress, YouTube and more that have a metric buttload of user data crying out to be indexed so it can be made more accessible. And as long as Scoble’s making the web the lynchpin, what about tagging? Everything on the web is tagable now. Filesystem metadata means you can have a metadata attribute called “Tags” that holds all the tags for a given file. Instantly every file on the computer is searchable by tags and the developer didn’t have to do a thing. Now what web service wouldn’t die to have all of that tagging and searching infrastructure already built for them? That’s a huge time sink for the developers that’s been taken care of in one shot by adding an advanced filesystem to the mix.

SOAP vs REST

Tuesday, May 2nd, 2006

I was reading an article about SOAP vs REST earlier tonight. The article has O’Reilly and Amazon both weigh in on SOAP vs REST and, predictably, SOAP gets smacked around quite a bit:

I think there are also some political aspects. Early in the web services discussion, I remember talking with Andrew Layman, one of the SOAP architects at Microsoft. He let slip that it was actually a Microsoft objective to make the standard sufficiently complex that only the tools would read and write this stuff, and not humans. So that was a strategy tax that was imposed by the big companies on some of this technology, where they made it more complicated than it needed to be so they could sell tools.

I’d say that’s a pretty fair assessment of SOAP. You could do it by hand (I know people who do), but at the end of the day you’ll just end up hating your life. You really need tools to help you out if you’re going to seriously party with SOAP. Honestly, for most applications I think SOAP is overkill. You won’t need 90% of what SOAP provides you and in the end you’ll still pay full price for using it. I’m sure somewhere, someone needs all that complexity…but the odds are pretty good you’re not that someone.

REST is touted as the simpler alternative. Unfortunately, depending on who you’re dealing with, REST isn’t always easy. If you’re dealing with REST fanatics, they’ll pick you apart on your use of HTTP telling you that you’re not doing “pure REST”. The problem is, REST means different things to different people because it’s loosely defined (that’s intentional). But because it is loosely defined, you’ll be hand rolling much of the code need to get the job done.

Personally, I don’t care one way or the other. Make it easy for me to develop against your stuff and I’m much more likely to use it. If I have to spend a bunch of time getting up to speed, I’m going to get bored and lose interest. It won’t matter if your service provides the neatest stuff in the world, if I swear uncontrollably and rip out my hair trying to use your service, you’ve failed.

Maps Hacking (volume 2)

Monday, February 27th, 2006

The other weekend I spent some time hacking together an HTML map view of my GPS running data using the Yahoo! Maps API. One thing I wanted to do was to try doing the same thing using the Google Maps API. Inspired once again to fiddle, I copied over the HTML file and tweaked it to use Google’s API.

There’s two primary reasons I wanted to try out the Google API:

  1. Polylines. Google’s API will let me hand it a list of latitude and longitude coordinates and it will turn them into a nice line. The Yahoo! version I did had to plot a ton of points, which resembled a line. The Google version actually draws a nice line. As long as you have enough data points, you’ll get smooth turns as well.
  2. Satellite imagery. For most people, maps don’t need satellite imagery. Having a list of the streets is enough. But around my house and my work there’s a lot of trails to run on. In the Yahoo! version of my maps, you’d see me off running in the middle of nowhere. The thumbnail (click it for a larger view) shows one of those instances where I was off running in the Bay Area marshes. On the Yahoo! version, you have no idea where I’m at. But on the Google version with the satellite imagery, you can clearly see I’m running on a trail around what appear to be salt ponds.

The verdict? See for yourself. There’s no comparison. The Google polyline looks SO much better than the points plotted on the Yahoo! map. And the satellite imagery makes it easy to see trails off the beaten path. This could actually be really nice for annotating all of the bike trails in Folsom.

The Google API isn’t perfect, though. I had to trim down the number of points I used for the polyline. My longer runs have a ton of points (my 7 mile run has over 700 points). That’s simply WAY too many for GPolyline to handle (on my machine anyway). I’m currently triming the list down size by only plotting every third point. I tried plotting all of them and even only half of them, both caused my browser to choke and to pop up the “this script is taking too long” dialog box. Still, with a third of the points I’m able to get relatively smooth lines and curves.

There’s some things I also just don’t like about the Google API. The restrictive key signup is kind of a pain. You sign up for an API key with the URL of the site you’re going to use it on. From there on out, that API key will only work with pages served from that site. It’s not that it’s hard to do that, but if I ever move my stuff off of beta.unclehulka.com to unclehulka.com, I’ll have to get another key and change it in my code. I also had to do some stuff in the HTML to make the polyline work properly in IE. Again, not terribly difficult, just a pain in the butt to have to remember (although really, I probably can’t blame this on Google…it’s an IE-ism).

In all, I think the Google Maps API is much better than the Yahoo! Maps API. Polylines are simply a must-have item for doing the kinds of things I’m doing with maps. And if you’re going to do anything involving areas that aren’t on streets or highways, you need the satellite imagery.

Hackathon weekend

Sunday, February 19th, 2006

Say you’re a relatively young engineer and you’ve got a three day weekend. You’ve also got a metric buttload of work to do. What would you do? Would you:

  • Spend the weekend working to catch up?
  • Enjoy the long weekend by going somewhere?
  • Stay in and mess around with various technologies?

If you picked the last one, then you’re as pathetic as I am this weekend. What’s wrong with us?

Anyway, this weekend I should be working. Instead, I chose to take part in my own, personal hackathon. The project? Integrate some GPS data with Yahoo! Maps using the Yahoo! Maps AJAX API.

Enter the Garmin

When I started training last year for a marathon, I purchased a Garmin Forerunner 301. The Forerunner straps to your forearm and records GPS tracking points and your heart rate. When you’re done running, you can export the GPS data from the watch using the USB cable. The training software that comes with the Forerunner (Training Center) has an export feature. That will get you all of your training data as XML. For a long time, I’ve been wanting to take the data from the watch and lay it over a Yahoo! Map.

Parsing the Data

The data exported by the Garmin provides you with a lot of information. For each run, it provides the total duration of the run, the distance and a series of what it calls Trackpoints. Each Trackpoint is essentially a GPS reading and consists of the time (down to the second), latitude, longitude and altitude. Once you parse out the Trackpoints, you have everything you need to begin plotting the data.

To get the data to the browser, I have a PHP script that parses the XML and generates JSON. The browser requests the run list on startup (that’s what gives you the list of runs on the left side). When you click on a run, the browser makes a request to get all of the Trackpoints for the given run.

All you need now is a map.

Rendering the Data

Following the directions on the Yahoo! Developer Network site for using the maps API, it’s pretty simple to get a map to show up on the page. I added links for each of the runs in my exported XML file. When you click on the link, it triggers the rendering of the route.

Each Trackpoint is plotted on the map using an overlay. You could use simple markers, but your running route will just look like a long series of pushpins. Not exactly what I was going for. I use a pushpin at the start and end of each route, so they’re clearly indicated. For all points inbetween, I use an overlay with a custom, 6×6 image (all blue). This makes the route stand out pretty well against the map itself.

The data contains about one Trackpoint for every 3-4 seconds. That’s a lot of points. While Yahoo! Maps is capable of handling that many points, I figured it best to throttle back the number of points I was plotting. While it renders fine on my machine, I wasn’t really sure how it would do on a slower system. So I only plot every third point.

For convenience, when you select a run, the points are plotted on the map and then the map centers over the newly plotted route. This makes it a lot easier to find what was just plotted. Say you’ve currently got the map centered over Folsom, CA and the last run you clicked was in Sunnyvale, CA. Think you’ll ever find the points on your own? Not easily. I calculate the center by averaging the latitude and longitude from the points given. This doesn’t provide a perfect center, but it gets the job done well enough.

Issues

I did have a couple of issues along the way. At first the centering of the map wasn’t working at all. I clicked a run that was in Sunnyvale,CA and it centered the map over Los Banos, CA. As it turns out, I hadn’t accounted for drop outs of the GPS signal during my run. The export file still contains Trackpoints for those times (because it’s logging the time and heart rate), but my script was interpreting them as zero degrees latitude and zero degrees longitude, completely throwing off the averages and breaking the map centering.

I also have an issue with the “Start” and “End” pushpin labels on the map. First, the little conversation bubbles that are supposed to draw around the labels don’t resize with the content. I’m not sure if I should be doing something or if the maps JavaScript should be handling that for me. Second, many of my runs start and end at the same place. That makes it difficult to see one or the other label in some of the route plots.

I’m also using the Connection Manager library from the recently released Yahoo! UI Library. I have to say, I really didn’t enjoy it. It’s not terribly verbose when things are wrong. I somehow managed to have it call my callback function over and over. It seems to have come from a bug in my JavaScript, but why that should trigger the UI library to call my callback over and over is beyond me.

I also didn’t like installing the library. You download a ZIP file and then you have to put some of the JavaScript files in your web directory and link to them using script tags in your HTML page. It tells you to include the YAHOO.js file along with the connection manager specific JavaScript file. Problem is, the main ZIP file comes with 14 copies of YAHOO.js! They’re not even all identical, but they are very close. I did a check between two of them and the only differences were in the code comments. Seems silly that there’s 14 copies of the file included. If that’s really supposed to be a generally reusable bit of code, there should probably only be a single copy.

The Final Product

I’ve put up the finished product if you’re interested in looking at it. You can see it at http://beta.unclehulka.com/map/. I checked to see that it will run in my available browsers. That includes IE 6 and Firefox 1.5 on Windows XP.

When the page initially comes up, you should see a list of my runs on the left (shows the date and the distance) and a Yahoo! Map on the right. Clicking on one of the runs should cause the map to plot the points of the run and then center the map over the route. Subsequent clicking on runs will cause the map to clear the points of the current run, draw the points for the new run and recenter the map once again.

What’s Next?

Who knows. Maybe this is all I’ll do with it. I’ve been thinking about moving it over to the Google Maps API. I prefer Yahoo! Maps more, but Google’s API would make drawing the route a lot easier since they actually have functions to draw lines instead of just dropping a ton of points on the page to create the semblance of a line. Additionally, their satellite imagery would allow me to see where my offroad routes go. If you look at some of the runs in Sunnyvale, you’ll see I’m way off the beaten path in the marsh (click on the 8-17-2005 run). With Yahoo! Maps it’s difficult to tell where on earth I was running. But if you look at it on Google Maps, you can clearly see I’m out on some trails in the wildlife preserve.

Look what I found

Tuesday, February 14th, 2006

Once again I’m working on my mathematical side project (it’s an unhealthy obsession). I’m writing a bunch of Java code for this. I know, “Java is slow.” Shut up: a) you’re wrong and b) it doesn’t matter. I’m trying to tackle a problem that algorithmically should take longer than this planet has left before the sun goes supernova. So what choice of language I use won’t really matter. The algorithm is what matters.

Anyway, I do a lot of stuff using java.util.BigInteger. BigInteger is slow. Even worse, it’s immutable. That means if I do something as simple as adding 1 to a BigInteger, it allocates another BigInteger to hold the answer. I do a lot of very miniscule operations on BigIntegers, so I’m positive my application is spending a ton of time allocating and garbage collecting BigIntegers.

As it turns out, Java has (in 1.5 anyway) a MutableBigInteger class (also in the java.math package). Unfortunately, it’s package private. That means only BigInteger gets to use it. Fortunately, Sun distributes the Java source with the JDK so I should be able to “borrow” that class by simply copying it to one of my packages. My code tends to throw BigIntegers away a lot. If I could switch to using a MutableBigInteger, I could throw them into a free list for later use, saving a lot of object creation and garbage collection overhead down the road.

I know, I could switch my code to C/C++ and use GMP. GMP is super fast, gives you mutable numeric types and is easy to use, I won’t deny that. But C/C++ makes me want to hang myself and most of the big number operations I need aren’t provided by GMP either (I have some odd requirements, don’t ask). I might as well use a language I enjoy and have good tools for.

If you had to work with really big numbers (560 bit and up), what language/library would you use? Multiplication, modulus and bit fiddling are absolute necessities. Speedy execution time would be nice, but right now development time is more important to me.

Speaking the native tongue

Sunday, February 12th, 2006

I’ve been interacting with computers for most of my life. I’ve been programming them for about the last 7-8 years. I figured I had a pretty good grasp of the language (no matter what you hear me screaming at computers, swearing isn’t actually their native language). This weekend I was reminded that computers don’t speak decimal. While it is easy and even natural to tell a computer to operate on decimal numbers, they really don’t like to do business that way. Case in point, division.

Say you want to convert meters to kilometers. Simple enough, take the number of meters you have and divide by 1000. For humans, we have a nifty shortcut for doing this…we just write the number down and move the decimal point three places to the left. No long division necessary. This is because we mentally picture/manipulate numbers in decimal (base 10). Each digit represents some power of 10. 1000 itself is a power of 10 (103), allowing you to perform the division by simply moving the decimal point three places to the left.

1234.0 meters = 1.234 kilometers

For computers, however, this isn’t nearly as simple. Computers look at the numbers in their binary form (base 2). That means each digit represents some power of 2. 1000 is not a power of 2, so you can’t do the quick decimal point swap-a-roo. You have to do the dreaded long division or repeated subtraction or whatever mechanism you (or your computer) prefer. Any way you slice it, you’ll never be as fast as you could be if you simply picked up the decimal point and moved it.

The nice thing is, you actually can pick up the decimal point and move it when working in binary…as long as you’re doing division by some power of 2. Say you want to divide by 16. Move the decimal point 4 places to the left (16 is 24). For example, divide eleven by two. Eleven in binary is 1011 and two is 10. As you can see, the decimal point just moves once to the left (because 2 is 21).

1011.0 / 10 = 101.1 (101.1 in decimal is 5.5)

So remember these things:

  1. When dividing by a number that is a power of the current base you’re working in, just shift the decimal point to the left. This works in any base (binary, decimal, octal, hexidecimal, etc).
  2. When performing math on a computer, remember that things that are easy for you because of your grasp of the decimal numbering system aren’t going to be easy for a computer, which only understands binary. If you can rework your code to do division and multiplication in base 2, you’ll be much better off.

In case you’re wondering why this came up, I’m working on an old problem where I was doing a bunch of very decimal centric operations on numbers. I shifted all that code to do things in a binary way and the speed improvement has been immense (not immense enough to make me rich yet, however).

dojo - good clean fun

Wednesday, February 1st, 2006

I played around with dojo a bit tonight, doing some work to prepare a demo for tomorrow (holy crap, it nearly IS tomorrow). I’m only using the drag and drop stuff at the moment, but I have to say, it’s awesome. With very few lines of code, I was able to set up drag sources (the things you pick up and move around) and drop targets (the things you drop other things onto). Override the drop handler and voila! Instant DND implementation handled in roughly 8-10 lines of code.

var drag = new dojo.dnd.HtmlDragSource(dojo.byId(”dragid”), “thing”);

var drop = new dojo.dnd.HtmlDropTarget(dojo.byId(”dropid”), ["thing"]);

drop.onDrop = customDropHandler;

That’s pretty much it. The first line makes something draggable. You pick the thing to drag by its ID. The “thing” can be whatever you want to call it. It’s sort of a drop classification. You use that in the second line when creating the drop target. The drop target is also picked by its ID. Again you see “thing” show up. This means that the drop target will accept any draggable in that classification. That allows you to have drop targets that only accept certain draggables on the page, in case you have a bunch of different drag types.

Dojo is pretty much just what the doctor ordered for people like me. People who know their way around JavaScript well enough to make something happen, but who aren’t adept enough to make it do anything flashy on their own.

Update: The complete lack of documentation and sample code for the majority of dojo is a major turnoff. While I was able to stumble my way through drag and drop, using the widgets has been a disaster. The demo seems to work fine and I try to duplicate what it’s doing. But no matter what, I just can’t get the widgets to do anything.

When is a leak not a leak?

Thursday, January 26th, 2006

At more than one interview I’ve been asked, “can you have a memory leak in Java?” Technically the answer is, no. Truthfully the answer is, sort of. Java can’t leak memory in the classic sense that languages like C and C++ can. In those languages (and others) it’s possible to allocate memory, drop it on the floor and lose it until the process exits. You can’t do that in Java. If you drop it on the floor, the garbage collector will find it. But you can put it somewhere and forget about it…like in a list or a map. It’s technically not a leak, because your application is still “using” it. But for all intents and purposes it walks, talks and acts like a leak.

There’s an article on IBM’s developerWorks site discussing leaks in Java. It actually uses a term I think is more accurate, “loitering.” The memory hasn’t been leaked, and it’s not doing anything…but it’s still hanging around. The solution, in Java, is to use references. More specifically, weak and soft references. They allow the garbage collector to collect objects, even if they’re still referenced…as long as those references are only weak or soft references.

Fun stuff, reminds me of why I miss Java sometimes.

SQLite - a fun time to be had by all

Sunday, January 15th, 2006

I’ve been messing around with SQLite a little in the last couple of weeks. SQLite is an embedded SQL database. It stores all of your database in a single file and it’s super light. That nice for doing prototypes or for just coding on the side. The data I pulled together for my 2005 financial year in review is actually in a SQLite database. I didn’t have to install any huge-mongous software packages (no MySQL or Oracle). Even better, if I ever want to back up, transport or replicate the database…it’s just a file copy.

There’s a PHP binding that I’ve been using. It works pretty well for the little one-off projects I’ve been tinkering on lately. I could probably just as easily use XML or some other file format to do this, but SQLite is already available in PHP 5 and it enables me to do some interesting data mashing by providing joins for free. There’s even a command line client that works like the MySQL/Oracle command line clients, allowing you to “connect” to your database and run queries against it. Evidently there’s also transaction support, although I haven’t done anything with that yet.

Update: Another fun trick for you PHP users, combine file_get_contents()/file_put_contents() and serialize()/unserialize(). Using this, you can easily write PHP objects to disk and read them back into memory. This is handy when all you’re storing is a simple object, list or map.

// Get the object from disk.
$obj = unserialize(file_get_contents(”foo.txt”));

// Change the object.
$obj->foo = “bar”;

// Save the object.
file_put_contents(”foo.txt”, serialize($obj));

Just remember, when you get the object you’ll have to read in the entire file. So if you have a large dataset, SQLite might be a better bet since it can selectively pull data from the file. Also, SQLite gives you things like “GROUP BY” and “ORDER BY” that you’d have to implement yourself if you go with the serialize/unserialize solution.

Hackathon December 2005 complete

Thursday, December 29th, 2005

Months ago we had an internal programming contest at Yahoo! to see who could build the coolest plugin for the Yahoo! Music Engine. A bunch of people entered and three won prizes…cold, hard, cash. I tried getting something together for the contest, but I fell short of getting anything decent running in time.

So when the latest contest was announced, I told myself I would NOT be missing it. The deadline is tonight (12/29) at midnight and I submitted my entry about 15 minutes ago. It feels great to have written something in my spare time. I haven’t really done that in a while. I wrote the Java search SDK for the Yahoo! Developer Network, but that’s different. I wrote that for a bunch of other people to use. I really don’t use it myself. The thing I wrote over the past week or two I’ll actually use, which probably explains how I managed to actually finish it. I think this fulfills my earlier idea to just start up my own hack day. Okay, it was more than a day (a lot more), but I’ll take it.

I think the rules of the contest prevent me from talking openly about what I’m working on right now, but I think that’s only because of the blind judging system they’ve set up. After the judging is done I’ll be able to talk more about it and even share it with anyone interested.

Now I just need to figure out what to hack on in January. ;)

JSONP…you’re joking, right?

Monday, December 12th, 2005

I understand people are hard up for their mashups, but this is crazy. Someone has decided it would be a good idea to give a name to a browser security exploit in the name of wider adoption. JSON with Padding (JSONP) is essentially a way of working around the cross domain security policies enforced by your web browser. Those security policies are in place to ensure that malicious code can’t report your private data to third parties and to prevent third parties from messing with your data on other web sites.

The problem is, there’s a hole in the security model. Using <script> tags, you can can work around the policies to execute code from another web site. That code is, in turn, free to do whatever it wants to do on your behalf on the website you’re currently browsing. Imagine, for instance, you’re browsing eBay. Now imagine that eBay includes a bit of JavaScript from one of their seller’s web sites like this:

<script src=”http://reallynastyhackers.com/ownyou.js”></script>

Oh sure, the seller says it’s just so they can do a little metrics gathering. Unbeknownst to you, the JavaScript ends up using your account to submit glowing feedback on the seller. Or maybe it submits a password reset request, locking you out of your account. Who knows, the point is, you just executed code from someone you don’t know. You probably did it without even knowing it happened. It’s like the days of Word macro viruses.

The point is, these JSONP loons are hitchiking a ride on a security flaw in the browsers. I’d guess (hope might be more accurate) it won’t be long before the browser developers put in a fix, at which point all of your JSONP code will cease to work. I do think there needs to be an easy way to let page scripts talk to multiple hosts, but I think this is the wrong way of doing it. There has to be some form of protection added to ensure that arbitrary code from the other hosts isn’t executed within the page.

The end of WordPress on unclehulka.com

Wednesday, August 17th, 2005

I think this sums up why I’ll be getting off of WordPress at some point. I’ve posted before about how concerning it has been to me that they’ve had so many security updates lately. Now it comes out that they released 1.5.2, were informed of a security issue in 1.5.2 and overwrote the old 1.5.2 tarball with a new one containing a patch.

This is just an example of irresponsible engineering and makes me wary of what’s really lurking under the hood. If they can’t be bothered to have good versioning practices, how can I be sure they’ve taken any time to engineer a quality piece of software.

Don’t get me wrong, I’ve enjoyed using WordPress and I think it’s a great program. Shame it seems to be a bit of a mess behind the scenes.

Update: If this is true then I’ll be a bit happier about what happened with regard to 1.5.2’s versioning. I’m still wary of all the security updating going on, though. There just seems to be way too much of that lately.

Yahoo! Developer Network, keeping me busy

Wednesday, June 29th, 2005

The Yahoo! Developer Network recently unleashed a couple of new services: the new maps API and the MyWeb 2.0 services. There’s really no work to be done for the maps API, but I’ll have to update my Java Yahoo! Search SDK to include the new MyWeb 2.0 functionality. Those guys are busy bees…I’m having a hard time keeping up with their new APIs.

I’m actually pretty excited about playing with the MyWeb 2.0 stuff. Sure, it’s a lot like what del.icio.us already has, but now it’s being offered by my company. del.icio.us is neat and all, but Y! has an entire suite of tools I’ll be able to integrate the MyWeb stuff with.

If you want to play around with MyWeb 2.0 but don’t have access to the beta yet, I allegedly have 95 invites to give out (it says I already gave out 5, but I don’t remember inviting anyone). Drop me a line at spamhaighter-myweb2@yahoo.com (or my personal address if you happen to know it) with your email address (I’m guessing it will have to be a Y! email address) and I’ll send you one (assuming I have any left by then).

Extended Filesystem Attributes

Tuesday, June 21st, 2005

Linux.com is running an article on extended attributes in the Linux filesystem. The gist of it is, you can store named attributes on a per-file basis and query it later on. This is great and all, but it could use some serious improvement.

BeOS was the first operating system I ever used that had support for filesystem attributes and used them well (I think MacOS used filesystem attributes, but I never saw them being used for anything useful). BeOS had a base set of attributes that every file had. This included things like the creation date, last modification date, content type and more. The content type was used by BeOS to handle automatic decoding of various filetypes on behalf of the application. So if your application knew how to handle video generically, BeOS would handle decoding a (for example) MPEG file into “video” for the application.

Each file type extended the base set of attributes. For example, music files would add attributes for the title, artist, bitrate and more. Mail files would add attributes for the sender, recipient, subject and whether or not there were attachments. Additionally, BeOS indexed all of the attributes and provided a search interface as well as a search API. That made it incredibly easy and fast to do searches of all your files. Searches could even be saved as folders. That made it easy to have a folder on your desktop called, “Mail from Ryan Kennedy.”

While it’s great that Linux has filesystem attributes, it would be better to see the attributes more tightly integrated with the rest of the OS.

Flickr Authentication API

Thursday, June 16th, 2005

This is so cool, Flickr has an API to allow 3rd party apps to act on your behalf. This includes desktop applications and web applications. I’m so happy Y! chose to buy Flickr. These seem to be right-thinking people who get the power of opening up their system to those who like to tinker. I’m even more happy that these creative people still have the ability to release new and exciting things like this even after being acquired by a considerably larger company.

Operator overloading can be nice

Saturday, June 11th, 2005

I was doing some “light” crypto hacking this evening. I’ve used GMP in the past for this type of work, but always the C binding. Tonight I decided to take my recently beefed up C++ knowledge for a spin and used the GMP C++ wrappers.

Java’s BigInteger is a pain in the ass to use. If you want to multiply numbers and store the result, you have to explicitly call a function. So multiplying A by B and storing it in C looks like this:

BigInteger A = new BigInteger(3);
BigInteger B = new BigInteger(5);
BigInteger C = A.multiply(B);

That doesn’t look like math at all. In fact, it’s a pain in the ass to decipher your mathematical formula into code that looks like that, especially when your formulas get increasingly complicated. Here’s what it looks like using GMP with the C++ binding:

mpz_class A = 3;
mpz_class B = 5;
mpz_class C = A * B;

Aside from the awkward naming of the big integer classes in GMP, this is a HUGE improvement. You code it exactly as you would if you were to use the primitive integer types. In fact, you can even use the primitive types interchangably:

int A = 3;
mpz_class B = 5;
mpz_class C = A * B;

The only way it could be a more seamless experience is if C++ had implemented arbitrary precision numbers in the language itself so I didn’t have to utilize a seperate type.