Solving the Twitter reliability problem

failwhale

Twitter, a web application that is simultaneously seemingly useless, highly addictive and surprisingly useful. It is also the single most unreliable web application that I have ever used.

When the world cup was on, the infamous “Fail Whale” was a common sight due to the number of people tweeting about there favorite team. But even now, with all that banished to history, Twitter errors are still an extremely common occurrence.

These problems only show one thing: Twitter is broken.

It is clear that Twitter has some serious internal architecture issues, which they are no doubt scurrying to fix. But considering the recent concerns in the web community over Facebooks privacy and Copyright policy, and the inherent single point of failure that comes with any centralised system. I feel that it is time that an open, decentralised alternative is developed.

I doubt anyone has actually noticed, but I have not bean online much over the past two weeks. Over this time I have bean researching and developing just such an application. And, after many annoying bugs, I have finally completed a working proof of concept.

The code

The code, written in clean MVC structured PHP, can be found on my GitHub page. You can download the code from that page using the “download source” button, or cloning it using GIT. This application does nothing to improve the reliability or performance of Twitter, that is impossible because it is a proprietary system. Instead it is a complete ground-up reimplementation of the same concepts, based around a distributed architecture. Please note that little attention has been payed to the visual styling of the application.

Unlike the existing social networking systems, this system does not have any concept of a central server. It is based around the idea of a distributed mesh, where each user would run there own node on there own server. Combined with remote caching to reduce network traffic, this allows the system to handle a lot of traffic as a whole, while remaining fast from a users point of view. For proof that such an architecture works, you need look no further than the email protocol.

Some other advantages of a distributed system include:

  • You control your own data, so nobody can sell it to 3rd parties, unless you decide to yourself.
  • Consequently, Only you can censor your posts. Nobody can delete anything because it breaks some usage policy.
  • There is no need for complex load management systems, which keeps the code simple. Simple systems are less likely to suffer from bad security venerabilities and are easier to maintain.
  • The protocol is open and you have direct access to the database, completely eliminating the problem of vendor lock in.

How it works

The design of this system is loosely based on the RSS protocol, a protocol for collecting the content from multiple websites and compiling it into an easy to read stream. Each node is a combined server and client, any messages posted by a user are made available as an XML stream. The URL that this is exposed at is called the “follow URL”.

To follow a user on a remote node, the software could just download the stream from the follow URL’s of the users that you are interested in, just like RSS. However there is a problem with this simple model, how to fill in the “followers” list on the remote node.

On its own, this is a pure Pull based protocol, meaning that the clients(all of the users that are following another user) always pull the stream from the followed users server. The only way to generate follower statistics from such a system is to count the number of unique requests hitting a URL in a given period of time. Needless to say that this is extremely unreliable, as anyone who has used FeedBurner to count RSS subscribers will know.

To solve this problem the system uses Pingbacks, small XML messages which instruct a remote server to do something. When a user on one node follows a remote user on a different node, the software downloads the remotes Follow URL, extracts the Pingback URL, a peace of meta-data contained in the stream, and then sends a “add follower” ping to the remote node. The remote then adds the local user to the remote users followers list.

Unfollowing is achieved by basically doing the opposite, but its a little more complex because of security. When one user follows another, the following users public key is stored on the followers node. The remove follower pingback is signed with the following users private key, and can only be validated using the public key stored on the remote. This means that followers can only be removed by the user that created them.

The last major feature is the ability for users to send messages to one another using at tags. In order to send a message to a remote it is necessary to know the Follow URL that corresponds to the user name. As there is no central user list this is obtained by looking through the users followers and following lists. This means that it is only possible to message people who you are following, or who are following you.

Sending the message itself is done using another type of pingback that adds the message to the remote users message in box.

Some features are impossible to implement efficiently in a distributed system. These are the features which require access to all of the user or messages on the entire network. Consequently it is impossible to implement a find user feature and global message searching.

The former I do not see as a major problem, because users normally find other users from external websites or the followers/following list. Hash tags are an impotent feature, as they allow discussions to be formed relating to a topic. It should be possible to implement something that creates this effect without the need for a centralised data store.

There is no such thing as a good visual HTML editor

dreamweaver no entry Today we all live in a society that thinks the only way to do things with computers is to use visual tools, unfortunately this also applies to the world of web design and development. Although there are visual HTML editors, the only thing they are good for is generating large amounts of unmaintainable code. Although it may look like it at first, the problem is not with these tools but with the way the web evolved over time.

HTML was originally designed for sharing physics research documents at CERN. Right from this stage, the basic operation of the language was to attach meaning, not style to the textual content. This allowed different parts of the text to be marked as different logical structures such as paragraphs and headings. This paradigm is widely known as “what you see is what you mean”, or WYSIWYM.

For a system designed to transmit textual documents, this makes a lot of since. If the program displaying the document knows that some content is a heading, it can automatically display all of the headings in a pre-set font and size automatically.

After HTML and the internet was released to the public however, entrepreneurs started to see the potential of the internet and wanted more control over the display of pages. These early web designers discovered that tables could be (mis)used to create page layouts, but there was still no way to specify fonts and colouring. Netscape, one of the popular browsers at the time, took it upon themselves to solve this problem, resulting in the addition of the `font’ and `center’ tags. Together these allow the font, colouring and centering of various page elements to be specified.

Around this time, the first visual HTML editors started to appear. These programs used nested tables and the Netscape formatting tags to emulate the “what you see is what you get” (WYSIWYG) paradigm popularised by visual word processors. Here lies the root of the problem with visual HTML editors, the language is based around the WYSIWYM paradigm, not WYSIWYG. In order to achieve there purpose of allowing HTML to be edited visually, they made a lot of compromises, in-lining styling information all over the code. In lining styling information completely breaks the original advantage of the language, the separation of content and styling. Because of this, websites created by these early design methods can only be described one way: completely unmaintainable.

Unsurprisingly this did not continue for long, a solution was developed in the form of Cascading Style Sheets(CSS), a way of externally specifying the styling and laying out the various elements of a web page. Microsoft jumped on CSS with version 6 of its Internet Explorer browser, unfortunately for the WYSIWYG html editor, they did not comply with the standard.

While HTML and CSS continued to develop for many years, with the rise of browsers like Firefox and Safari, Internet explorer 6 managed to hold onto most of the browser market. Because all the other available browsers implemented the newer HTML standards, while IE6 was still dragging behind it made it extremely difficult to create web pages that work across multiple browsers. To make things even worse, when Microsoft came out with version 7 of Internet Explorer in 2006, they still failed to comply with the standard.

To work around the oddities of Internet Explorer a number of hacks were created. Unsurprisingly the visual editors attempted to adopt the use of these hacks into there code output, which ultimately just resurrected the problem of in-lined styling and consequently the output of unmaintainable code.

WYSIWYG only works when you can grantee that a display device will display something in a given way. Because of all of the differences between browsers this is not the case. Currently it is completely impossible to implement a WYSIWYG HTML editor that both works and produces clean code. If at some point in time all browsers implement the same standard (which is highly unlikely as Microsoft always lags behind massively with Internet Explorer) then it would be possible to implement a working visual HTML editor. But even if that does become the case, you are still better off hand-writing your HTML/CSS.

The thing is, the web is an extremely flexible platform. If you use a visual HTML editor, you are directly constraining yourself to the subset of the platform implemented by the tool. If any new design techniques are developed after the tool was published, you cannot use them without hand editing the code. What’s more, meny of the techniques used in modern web design, for example CSS sliding doors, are impossible to represent in a visual manor.

All you need to design and develop for the web is a plain text editor, preferably one with syntax highlighting and auto indent. You can find many excellent text editors available on the internet for free, such as notepad++, Emacs and Vim.

If you are using Dreamweaver as it attempts to make editing static HTML websites less of an impossibility, stop living in the past: use a content management system.

Static websites, just don’t go there

If you want to create a website and are thinking about creating a static HTML site with something like Dreamwever Stop right there. The web has evolved far beyond the days of static HTML, allowing absolutely anybody on the planet to easily create and maintain high quality websites. This is thanks to the evolution of the Content Management System(CMS).

Static websites are an absolute nightmare to maintain, just adding one page is a major task: you have to download every HTML file, add the page, go through all of the other pages and update the links, then re-upload the entire site again. On the other hand if you are using a CMS, this whole task is as simple as logging into your CMS’s administrator interface and clicking on the “new article” button, at which point you can add your new page and edit it using a simple WYSIWYG(visual) editor right in the browser. The software does everything else for you.

A CMS based website is also a lot easier to create in the first place. All you need to do is install the software on a server and add your content using the CMS’s web interface. Lastly you just pick or create a template for the visual styling, simple. If you went down the static route, your life would be a lot harder. First you would need to first create a single page with the final design, then duplicate this file for every single page on the website, editing them to add the content. The final and verry time consuming task is to go through every single page and add the links to all of the other pages.

When you are dealing with static websites, drastically changing the design or layout is an absolute nightmare because you have to go through and manually change every single page[*]. If you are using a CMS, this problem completely goes away as all of the displayed pages are generated from a template. If you want to change the logical structure of a page, you only have to change the template.

There are many free and open source CMS’s available on the internet, two of the most popular being WordPress and Joomla.

WordPress is the easier to use of the two, at the expense of being less flexible. Its most well known as a Blog engine, but it is also capable of maintaining a hierarchical structure of static pages. There are many extensions available for WordPress to extend its functionality and plenty of free templates to change its visual style. WordPress templates can be created by anyone with some basic understanding of HTML, no knowledge of PHP is required. It is an excellent choice for someone wanting to create a Blog or a small to medium sized website.

Joomla is more advanced and capable of managing large websites with ease. Although as a result it does have a steeper learning curve. Like WordPress there are large number of extensions available to extend its functionality. Creating templates for Joomla does require a reasonable understanding of PHP programming, but there are already lots of them available on the internet for free. Joomla is best suited to the creation of medium to large websites, for small sites its complexity and flexibility is overkill.

Static websites are hard to maintain and so are liable to being abandoned for large periods of time. The previous version of this website fell prey to that problem. These days even the cheapest web hosts provide everything you need to run a CMS: PHP and a MySQL. There is simply no excuse for not using one. Unless of course you enjoy making your life difficult.

Footnotes

[*] CSS does help here, but it can only do so much.