Solving the Twitter reliability problem

failwhale

Twitter, a web application that is simultaneously seemingly useless, highly addictive and surprisingly useful. It is also the single most unreliable web application that I have ever used.

When the world cup was on, the infamous “Fail Whale” was a common sight due to the number of people tweeting about there favorite team. But even now, with all that banished to history, Twitter errors are still an extremely common occurrence.

These problems only show one thing: Twitter is broken.

It is clear that Twitter has some serious internal architecture issues, which they are no doubt scurrying to fix. But considering the recent concerns in the web community over Facebooks privacy and Copyright policy, and the inherent single point of failure that comes with any centralised system. I feel that it is time that an open, decentralised alternative is developed.

I doubt anyone has actually noticed, but I have not bean online much over the past two weeks. Over this time I have bean researching and developing just such an application. And, after many annoying bugs, I have finally completed a working proof of concept.

The code

The code, written in clean MVC structured PHP, can be found on my GitHub page. You can download the code from that page using the “download source” button, or cloning it using GIT. This application does nothing to improve the reliability or performance of Twitter, that is impossible because it is a proprietary system. Instead it is a complete ground-up reimplementation of the same concepts, based around a distributed architecture. Please note that little attention has been payed to the visual styling of the application.

Unlike the existing social networking systems, this system does not have any concept of a central server. It is based around the idea of a distributed mesh, where each user would run there own node on there own server. Combined with remote caching to reduce network traffic, this allows the system to handle a lot of traffic as a whole, while remaining fast from a users point of view. For proof that such an architecture works, you need look no further than the email protocol.

Some other advantages of a distributed system include:

  • You control your own data, so nobody can sell it to 3rd parties, unless you decide to yourself.
  • Consequently, Only you can censor your posts. Nobody can delete anything because it breaks some usage policy.
  • There is no need for complex load management systems, which keeps the code simple. Simple systems are less likely to suffer from bad security venerabilities and are easier to maintain.
  • The protocol is open and you have direct access to the database, completely eliminating the problem of vendor lock in.

How it works

The design of this system is loosely based on the RSS protocol, a protocol for collecting the content from multiple websites and compiling it into an easy to read stream. Each node is a combined server and client, any messages posted by a user are made available as an XML stream. The URL that this is exposed at is called the “follow URL”.

To follow a user on a remote node, the software could just download the stream from the follow URL’s of the users that you are interested in, just like RSS. However there is a problem with this simple model, how to fill in the “followers” list on the remote node.

On its own, this is a pure Pull based protocol, meaning that the clients(all of the users that are following another user) always pull the stream from the followed users server. The only way to generate follower statistics from such a system is to count the number of unique requests hitting a URL in a given period of time. Needless to say that this is extremely unreliable, as anyone who has used FeedBurner to count RSS subscribers will know.

To solve this problem the system uses Pingbacks, small XML messages which instruct a remote server to do something. When a user on one node follows a remote user on a different node, the software downloads the remotes Follow URL, extracts the Pingback URL, a peace of meta-data contained in the stream, and then sends a “add follower” ping to the remote node. The remote then adds the local user to the remote users followers list.

Unfollowing is achieved by basically doing the opposite, but its a little more complex because of security. When one user follows another, the following users public key is stored on the followers node. The remove follower pingback is signed with the following users private key, and can only be validated using the public key stored on the remote. This means that followers can only be removed by the user that created them.

The last major feature is the ability for users to send messages to one another using at tags. In order to send a message to a remote it is necessary to know the Follow URL that corresponds to the user name. As there is no central user list this is obtained by looking through the users followers and following lists. This means that it is only possible to message people who you are following, or who are following you.

Sending the message itself is done using another type of pingback that adds the message to the remote users message in box.

Some features are impossible to implement efficiently in a distributed system. These are the features which require access to all of the user or messages on the entire network. Consequently it is impossible to implement a find user feature and global message searching.

The former I do not see as a major problem, because users normally find other users from external websites or the followers/following list. Hash tags are an impotent feature, as they allow discussions to be formed relating to a topic. It should be possible to implement something that creates this effect without the need for a centralised data store.

Static websites, just don’t go there

If you want to create a website and are thinking about creating a static HTML site with something like Dreamwever Stop right there. The web has evolved far beyond the days of static HTML, allowing absolutely anybody on the planet to easily create and maintain high quality websites. This is thanks to the evolution of the Content Management System(CMS).

Static websites are an absolute nightmare to maintain, just adding one page is a major task: you have to download every HTML file, add the page, go through all of the other pages and update the links, then re-upload the entire site again. On the other hand if you are using a CMS, this whole task is as simple as logging into your CMS’s administrator interface and clicking on the “new article” button, at which point you can add your new page and edit it using a simple WYSIWYG(visual) editor right in the browser. The software does everything else for you.

A CMS based website is also a lot easier to create in the first place. All you need to do is install the software on a server and add your content using the CMS’s web interface. Lastly you just pick or create a template for the visual styling, simple. If you went down the static route, your life would be a lot harder. First you would need to first create a single page with the final design, then duplicate this file for every single page on the website, editing them to add the content. The final and verry time consuming task is to go through every single page and add the links to all of the other pages.

When you are dealing with static websites, drastically changing the design or layout is an absolute nightmare because you have to go through and manually change every single page[*]. If you are using a CMS, this problem completely goes away as all of the displayed pages are generated from a template. If you want to change the logical structure of a page, you only have to change the template.

There are many free and open source CMS’s available on the internet, two of the most popular being WordPress and Joomla.

WordPress is the easier to use of the two, at the expense of being less flexible. Its most well known as a Blog engine, but it is also capable of maintaining a hierarchical structure of static pages. There are many extensions available for WordPress to extend its functionality and plenty of free templates to change its visual style. WordPress templates can be created by anyone with some basic understanding of HTML, no knowledge of PHP is required. It is an excellent choice for someone wanting to create a Blog or a small to medium sized website.

Joomla is more advanced and capable of managing large websites with ease. Although as a result it does have a steeper learning curve. Like WordPress there are large number of extensions available to extend its functionality. Creating templates for Joomla does require a reasonable understanding of PHP programming, but there are already lots of them available on the internet for free. Joomla is best suited to the creation of medium to large websites, for small sites its complexity and flexibility is overkill.

Static websites are hard to maintain and so are liable to being abandoned for large periods of time. The previous version of this website fell prey to that problem. These days even the cheapest web hosts provide everything you need to run a CMS: PHP and a MySQL. There is simply no excuse for not using one. Unless of course you enjoy making your life difficult.

Footnotes

[*] CSS does help here, but it can only do so much.

3 simple tips for a better browsing experience

For many people, web browsing is one of the most common computing tasks, however the behaviour of typical web browsers `out of the box’ is less than optimal. Add to this the amount of junk which accumulates such as old bookmarks and browser history. Combined these things detract from the experience of browsing the web.

Your web browsing experience can be easily improved by following the three simple tips described below.

All of these tips include instructions for implementing them in Firefox, as this is the browser I use. However most, if not all of them should be achievable in all recent browsers. As always, Google is your friend.

1: Unclutter your bookmarks

I can’t speak for everyone, but for me the volume of bookmarks which are of no relevance anymore tend to build up at an alarming rate. The obvious solution to this is to go through all of them one by one and delete the ones you don’t need. If the number of bookmarks in your bookmarks menu is reasonably small, lass than 50 or so feel free to do this, however if your bookmark count is in the hundreds or more, a better approach is called for.

Start by moving all of your current bookmarks into a new folder called “old bookmarks” or similar. After you have done this continue your browsing as you would normally, except when you use one of your old bookmarks, move it out of the “old bookmarks” folder, ether into the top-level menu or a folder if appropriate.

After a period of a few months all of the commonly used bookmarks will have bean retrieved and you can trash the whole “old bookmarks” folder, along with all your useless bookmarks.

Now you have your bookmarks clean, regardless of the method you used to clean them, you need to make it a routine to clean out useless bookmarks to stop them from building up again. This can be done when you open the browser, when you close the browser, when you have time to kill, once a week or whatever works for you.

2: Clean up the interface

Browsers have lots of useless junk in the UI like status bars, useless menus and large empty gaps. All of which detract from the primary purpose of a browser: displaying web pages. On small screens, such as featured in netbooks, this can also be a major usability problem.

Unbeknown to most users, the interfaces of modern browsers are highly customisable and can be shrunk in size with a little tweaking. In Firefox this can be done by following these steps:

The first step is to disable the bookmarks tool bar in the `view->tool bars’ menu and the status bar in the `view’ menu. This alone gets rid of two of the four rows in the default user interface.

The next step involves right clicking on the gap next to the menu bar and selecting customise. When in this mode the widget toolbox shows up. Firefox allows you to rearrange the UI items by clicking on them and dragging them to where you want them to go. Start by dragging the URL and search boxes into the empty gap next to the menu bar. Next get rid of the forward/back refresh and stop buttons by dragging them onto the widget toolbox. Finally drag the home button so that it lies between the menu and the URL bar, you may want to get rid of this button as well if you don’t use it. You can now close the widget toolbox with the done button.

The last step is to get rid of the now empty navigation tool bar which can be done by unticking the `navigation tool bar’ option in the `view->tool bars’ menu.

The screenshot below shows two instances of Firefox, the top one shows the default UI and the bottom one shows the minimalised UI. From this it can be seen that quite a substantial amount of space can be saved just using the built in customisation options. Firefox’s UI can be further customised using the vast array of add-ons available for the browser. For example “Menu-mod” which allows the contents of the browsers menus to be customised and unused menus removed completely.

comparison of default firefox UI and the cleaned up version

3: Block flash and adverts

Recently the proliferation of animated flash adverts on the internet has skyrocketed. I am sure that you are aware of there placement on sites containing mostly textual content such as articles. Because the adverts are specifically designed to draw your attention, the main content of the website becomes close to unreadable as you are constantly distracted by the adverts. Additionally as flash can be very CPU intensive, it also causes the entire system to slow down.

The browser add-on developers come to the rescue in this case with add-ons like Adblock plus Flashblock.

Adblock plus works by maintaining a list of URLs of advertising content, such as DoubleClick. If any of them appear in a page you are using, that content is simple stripped out. This works well presuming the database is kept up to date with all of the ad providers. ABP uses a system of filter subscriptions to achieve this, but it is still not prefect because it does not deal with sites that host there own ads.

Flashblock solves this problem by replacing all flash objects with a pace holder image. If the pace holder is clicked, the flash content is loaded.

The combination of Adblock plus and Flashblock removes all flash ads from a page, but still allows you to view worthwhile flash content like embedded flash videos on sites like youtube.

Never buy anouther peace of software

I have never bought a program, however I am also not a software pirate.

How?

I use open source software.

Open source software is different because, unlike traditional software, the entire source code of an application is available free of charge, to use, edit and redistribute however you see fit. Because of the openness, development is preformed by a distributed community, more developers = less security problems. Also because there is no money involved, there is no incentive to deliberately break backwards compatibility.

One of the most used applications by most people is an office suite, particularly a word processor. Open Office fills this hole nicely.

Firefox A fast, secure, heavily customisable, standards complaint web browser that works the way you do.

GIMP A powerful bitmap image editor.

These examples hardly scratch the surface of what’s available, you can find open source applications to do most tasks. As usual Google is your friend, try including the words “open source” next time you are searching for a program.