Wednesday, 1 October 2014

It is a bad plan that admits of no modification

I find it somewhat interesting that thousands of years later that our society still uses Publilius Syrus sententiae though I imagine the tendency to leave well enough alone means such phrases stay in usage.

Marvell ARM system - Photo from Steve McIntyre
One weekend Steve McIntyre asked me if I could find a source of some of some 40mm fans for some systems with some pretty strict requirements. They needed to be long life and shift a lot of air to combat a persistent overheating issue.

I sat with him and went through the Farnell utterly hateful parametric web interface and eventually came up with a couple of options which were very expensive. Only then did I stop and ask what the actual problem was.

Marvell ARM system Original internal cooling arrangement - Photo from Steve McIntyre
Steve showed me one of the Debian ARM buildd boxes which are Marvell development machines. These systems are powerful quad core machines housed in compact steel enclosures.

There is a single 40mm fan trying to provide cooling for the entire enclosure. When the units are placed horizontally and used intermittently this proves adequate. Unfortunately when the system are arranged vertically in a rack and run at full load continuously they often overheat and have to be restarted. In addition the small high speed fans need replacing frequently as their bearings wore out quickly.

Debian ARM buildd systems - Photo from Steve McIntyreThis was obviously causing some issues for the ARM Debian ports which Steve wanted to rectify. After talking the problem through for a while we came to the conclusion we could use much larger 60mm fans to blow air directly through the top of the case onto the cpu heatsink.

Larger fans can be run much more slowly to move a similar volume of air to the smaller 40mm fans which gives a much longer service life.

Hole punch and Drilling template
Steve proceeded to order enough parts to allow us to modify all the Debian systems, this worked out cheaper than a single "special" 40mm high volume fan.

I acquired a rather large steel hole punch, I chose this tool because it produces a much superior finish to a hole cutter and this project demanded a high level of finish (not to mention I loved having a valid excuse to own and use a huge allen key!)

If we had simply been modifying a single case I would have measured and marked up by hand. With the prospect of altering at least eight I laser cut a template from plywood which Andy Simpkins took great glee in excessively annotating.

We also used the opportunity to add bolt holes to securely attach the 2.5 inch SATA drives instead of using sticky pads.

Steve and I modified a single system to begin with both to check our alignment and the efficacy of the change. We were pleasantly surprised to discover that hoiby could now repeatedly do kernel compiles with all four cores flat out which was not possible before. The measured CPU temperature, which had previously been around 90°C, did not rise above 40°C

Steve and Andy on the assembly line
Steve, Andy and I then arranged a day where we took all the remaining units out of the rack at ARM, modified and returned them. We used the facilities at the Cambridge Makespace where I am a member to do the modifications.

I broke two 3mm drill bits and dulled a 4mm bit drilling all the holes, Roger Smith was good enough to loan us the use of his "Christmas tree bit" to ream the fan hole out to 16mm so we could thread the hole punch and cut the 60mm fan aperture out.

six modified systems ready to be re-racked.
We managed to get quite an assembly line going and, in my opinion, the results look pretty professional.

It has been several months since we did this work and these systems continue to run without issue. To complete the story we can see some graphs courtesy of the DSA munin instance.

CPU load on arnold.debian.org
You can clearly see the huge drop in temperature at the end of Week 25 despite the continuously high CPU load. Also there is only a single gap in the data after the changes (these indicate crashes where data was not recorded) where before there were frequent and extensive times where the systems were simply unusable.

CPU Temperature of arnold.debian.org
One reason I continue to enjoy Debian so much is the wide variety of ways in which I can contribute not only by maintaining my packages. Sometimes this kind of work does not receive the credit it deserves and hopefully highlights a small part of the frantic paddling that goes on under the serene surface of the Debian project to keep things "just working".

Wednesday, 24 September 2014

I wanted to go to Portland because it's a really good book town.

Plane at Heathrow terminal 5 taking me to America for Debconf 14Patti Smith is right, more than any other US city I have visited, Portland feels different. Although living in Cambridge, which sometimes feels like where books were invented, might give me a warped sense of a place.

Jo McIntyre getting on the tram at PDX
I have visited Portland a few times previously and I feel comfortable every time I arrive at PDX. Sure the place still suffers from the american obsession with the car but similar to New York you can rely on public transport to get about.

On this occasion my visit was for the Debian Conference which i was excited to attend having missed the previous one in Switzerland. This time the conference has changed its format to being 10 days long and mixing the developer time in with the more formal sessions.

The opening session gave Steve McIntyre and myself the opportunity to present a small token of our appreciation to Russ. The keynote speakers that afternoon were all very interesting both Stefano Zacchiroli and Gabriella Coleman giving food for thought on two very different subjects.

The sponsored accomodation rooms were plesent
Several conferences in the past have experienced issues with sponsored accommodation and food, I am very pleased to report that both were very good this time. The room I was in had a small kitchen area, en-suite bathroom, desks and most importantly comfortable beds.

Andy and Patty in the Ondine dining area
The food provision was in the form of a buffet in the Ondine facility. The menu was not greatly varied but catered to all requirements including vegetarian and gluten free diets.

Neil, Rob, Jo, Steve , Neil, Daniel and Andy dining under the planes
Some of us went on a visit to the Evergreen air and space museum to look at some rare aircraft and rockets. I can thoroughly recommend a visit if you are in the area.

These are just the highlights of the week though, the time in the hack-labs was productive with several practical achievements Including:
- Uploading new packages reducing the bug count
- Sorting out getting an updated key into the Debian keyring.

Overall I had a thoroughly enjoyable time and got a lot out of the conference this year. The new format suited me surprisingly well and as usual the social side was as valuable as the practical.

I hope the organisers have recovered enough to appreciate just how good a job they did and not get hung up on the small number of things that went wrong when the majority of things went perfectly to plan.

Monday, 15 September 2014

NetSurf 3.2

We recently released a new version of NetSurf this was largely to address numerous small bugs but did also include the persistent caching implementation I have written about previously. A release used to require the release manager (usually me) to perform a lot of manual processes and while we had a checklist it was far too easy to miss things.

The Continuous Integration (CI) system combined with signed release tags in git has resulted in a greatly simplified process indeed it has become almost completely automated. The majority of the manual work is now confined to doing the tasks that require actual decision making and checking we are releasing what was intended.

By having the CI system build release binaries the project now has a much clearer and importantly traceable process, I can recommend such a system to any project that produces releases especially if they release binaries for any of their targets.

I have also managed to package and upload this version of NetSurf ready for the Debian Jessie release. I would like to thank Jonathan Wiltshire for his assistance in ensuring this was a good quality package.

The release incorporates the successfully merged work of Rupinder Singh who was our our GSoc 2014 student. Rupinder mainly made improvements to our core DOM implementation and was very responsive and enthusiastic throughout his time despite the mentor team sometimes not being available.

This work goes towards improving NetSurf in the future by ensuring the underlying features are present in our core libraries. The GSoc mentors and all project developers are all pleased with the results of this years GSoc participation and would like to thank everyone involved in making our participation possible.

Along with the good news comes a little bad:
PowerPC Mac OS X
Despite repeated calls for assistance with new hardware and Java builds none has been forthcoming meaning that from this release we ware no longer able to ship PowerPC builds for MAC OS X.

The main issue is the last version of MAC OS X that runs on PPC is Leopard and there is no viable Java 1.6 port necessary for our CI system to run. Additionally the fully loaded PPC Mac mini (kindly donated to us by Mythic Beasts) had become far too slow to keep up with our builds and was causing long delays.
Bugs
NetSurf 3.2 Bug graph
We have a lot of bugs, in fact just during this release cycle we have 30 more bugs reported than we closed.So while the new bug reporting system has been a success and our users are reporting issues when they find them the development team is not keeping up..

The failure to keep up stems from the underlying issue of lack of manpower. We have relatively few active developers which is especially problematic when there are many users for a platform, such as RISCOS, but the maintainer is unable to commit enough time to fixing issues.

If you would like to help making NetSurf a better browser we are always happy to work with new contributors.

Sunday, 24 August 2014

Without craftsmanship, inspiration is a mere reed shaken in the wind.

While I imagine Johannes Brahms was referring to music I think the sentiment applies to other endeavours just as well. The trap of believing an idea is worth something without an implementation occurs all too often, however this is not such an unhappy tale.

Lars original design idea
Lars Wirzenius, Steve McIntyre and myself were chatting a few weeks ago about several of the ongoing Debian discussions. As is often the case these discussions had devolved into somewhat unproductive noise and yet amongst all this was a voice of reason in Russ Allbery.

Lars decided that would take the opportunity of the upcoming opportunity of Debconf 14 to say thank you to Russ for his work. It was decided that a plaque would be a nice gift and I volunteered to do the physical manufacture. Lars came up with the idea of a DEBCON scale similar to the DEFCON scale and got some text together with an initial design idea.

CAD drawing of cut paths in clear acrylic
I took the initial design and as is often the case what is practically possible forced several changes. The prototype was a steep learning curve on using the Cambridge makespace laser cutter to create all the separate pieces.

The construction is pretty simple and consisted of three layers of transparent acrylic plastic. The base layer is a single piece of plastic with the correct outline. The next layer has the DEBCON title, the Debian swirl and level numbers. The top layer has the text engraved in its back surface giving the impression the text floats above the layer behind it.

Failed prototype DEBCON plaqueFor the prototype I attempted to glue the pieces together. This was a complete disaster and required discarding the entire piece and starting again with new materials.

The final version with stand ready to be presented
For the second version I used four small nylon bolts to hold the sandwich of layers together which worked very well.

Presentation of plaque photo by Aigars Mahinovs
Yesterday at the Debconf 14 opening Steve McIntyre presented it to Russ and I think he was pleased, certainly he was surprised (photo from Aigars Mahinovs).

The design files are available from my design git repo, though why anyone would want to reproduce it I have no idea ;-)

Wednesday, 16 July 2014

It is no great secret that my colleagues at Collabora have been doing work with the Raspberry Pi Foundation.

My desk is very near Marco and I often see him working with the various Pi boards. Recently he obtained one of the new B+ units for testing and I thought it looked a little sad sat naked on his desk.

To remedy this bare board problem I designed and built a laser cut a case for him and now the B+ has been publicly released I can make the design freely available.

The design is completely original though is inspired by several other plastic "clip" type designs I have seen. Originally I created and debugged the case design for my parallella though tweaking it for the Pi was pretty easy.

The design is under a CC attribution licence and I ought to say that my employer is in no way responsible for this, its all my own fault.

Wednesday, 26 February 2014

There are only two hard things in Computer Science: cache invalidation and naming things.

It is the first of these which I have recently been attempting and I think Phil Karlton might have a good point.

What are we talking about?

Web browsers are pretty complex beasties but the basic concept is pretty easy to understand. They fetch a bunch of files that make up a web page and render those source files into something suitable for human consumption.

It is also intuitive that fetching all the files that make up a web page, every time you browse to a new page, might be wasteful, especially if most of the files had not changed from the previous page.

In order to address this browsers hang onto copies of these files in case your browsing needs them again, this is known as a source file cache. Caches are a widely used technique throughout many aspects of computer technology but the basic idea is that they trade one resource for another.

In this case we are trading local storage space (memory or disc) for access time. This type of trade may give large benefits due to the large differences in access times (sometimes known as latency) between local and network resources.

For example, if the source files that make up a web page are a Megabyte in size accessed with a 2013 era PC with a 10Mbit connection to the Internet:
  • Accessing the data from main memory will take approximately 20µs (millionths of a second).
  • The same data retrieved from a hard disc drive will take 4,000µs to arrive, around 200 times longer.
  • If we were to retrieve the data from a fast web server and assuming the network connection is in perfect working order, the data will roll in 1,200,000µs later, or around 300 times more slowly than disc and a unfathomable 60,000 times slower than memory.
From this example it becomes startlingly obvious why a source cache is so desirable. I have greatly simplified the browsers operation here as they usually implement many layers of additional caching in other areas to make the browsing experience subjectively smoother.

And before the observant reader suggests that we could just make the network faster it ought to be mentioned that due to fundamental physical constraints, like the speed of light, it would be unlikely that the average practical network reply will ever drop much below 150,000µs [1] or some 40 times slower than disc. Still a worthwhile gain.

Knowing the cost of everything and the value of nothing

Having shown the benefits it might be nice to think there are no downsides to caching, this is not the case, as I mentioned we are trading storage for time and as in any trade one must expect there to be overheads.

In the case of a browser we face several obstacles. Firstly not every file that goes to make up a web page can be cached. The HTTP specification contains an entire section on caching which contains numerous rules and operations to ensure that only objects that should be cached are and determines for how long they can be kept before going "stale".

A great deal of this complexity comes from just how desirable caching is to network providers. Many ISPs will have web proxy servers to cache requests for all users to reduce their bandwidth usage, or increasingly, to implement content filtering according to the local government censorship rules. The browsers cache must know how to interact with these proxies without getting erroneous results.

Then comes the problem alluded to in the section title. A browsers cache cannot grow indefinitely. Eventually a system would run out of memory and disc if a browser kept every file. To deal with this the cache is limited sometimes by the number of files it holds, but more commonly by size both in memory and on disc.

The caching structure used by a web browser is hierarchical in that it will first evict (move) files from memory to a backing store (disc) and when the disc cache size limit is exceeded files are evicted to the next tier. Which in this case, will involve deleting the local copy (invalidating) and requiring performing a fetch from the network if the file is needed again.

Tardis at the Beeb by Sarah G from flikrWhen the cache size is exceeded the decision is made about what to remove from the cache using a cache strategy or algorithm. The aim of this decision is to discard the data that will not be required again for the longest time.

If we had a blue box with a madman inside, who we could persuade to bring our browsing history back from the future, implementing bélády's algorithm might be possible. Failing a perfect solution we are reduced to making a judgement based on the information available to us.

This involves computing a cost for replacing each entry in the cache and evicting those with the lowest values. The selected algorithm has a large impact on the effectiveness of the cache and often needs fine tuning for the task at hand.

Never has so little been measured so much

Having established that caching is useful and that we need to make decisions on what to keep it follows that those decisions must be based on information. This information consists of two main groups:
  • A series of metrics about the cache state such as the number of objects and their cumulative size currently being stored in memory.
  • Information about each object, known as metadata, this includes things like the objects individual size, how long until it becomes stale and when it was last used.
Maintaining all of this information is a significant part of the overheads involved with caching and a compromise must be struck between having the necessary information to make good caching choices and the cost maintaining that data.

Cache metrics

Most cache implementations maintain at least some basic metrics, at the very least how much storage is being used so a decision on if evicting entries from the cache to reclaim space is necessary.

It is also common to keep a record of how well the cache is performing. The cache efficiency (the hit rate) is usually measured as the ratio of successful accesses provided from the cache (a cache hit) to the total number of requests to the cache.

A perfect cache would have all hits and no misses (a hit rate of 1) while using as little storage as possible (but as already noted that requires a special madman). The hit rate can be used to automatically change the behaviour of the eviction algorithm, perhaps using more storage if the hit rate drops too low or changing the eviction frequency.

Metadata

Each object must carry with it the data necessary to regain its state when retrieved from the cache. This information is essential for the correct operation of the cache but in of itself provides no direct value.

In the case of browser source data this includes all the headers sent in the original network fetch. These headers are themselves just metadata sent along with the object by the web server which must be preserved.

Two particularly useful value for making a good eviction decision are how long it has been since the object was last used and how many times it has been used. This is because an object that was cached a long time ago and then never used again is much less valuable than one that has recently been repeatedly used.

An Implementation

It is all very well talking about the general solution but a full understanding can only be gained by examining a real implementation. The NetSurf cache was chosen as it is a suitably small amount of code but still demonstrates all the major design features.

Interface

The cache deals with all source data and provides the browser with a single unified object request interface regardless of if the source data can be cached or not. This is useful to hide the implementation details allowing for changes without affecting the rest of the browsers code.

The implementation effectively consists of two lists of objects, those that can be cached and those that cannot. Each object on these lists is reference counted against one or more users and while there is at least one user the source data is maintained in memory.

Objects determined to be non cached are always directly fetched from their source and are immediately released once they have no users remaining.

For cacheable objects an attempt is made to fulfil the request in descending order:
  1. from objects already in memory.
  2. from the backing store.
  3. from a network fetch.
Irrespective of the source of the data the remaining operations on the object (determining freshness etc.) are exactly as if the request had been fulfilled from the network. This approach means the use of the cache is transparent to the object users.

Writeout

Objects are placed in the backing store asynchronously to any other operations. When an object has been fetched from the network a background task is scheduled for when the browser is otherwise idle.

This writeout task constructs a list of all cacheable objects not yet in the backing store, associates a cost value with each object and then proceeds to write those objects to the backing store in highest to lowest value order.

Cleaning

A periodically scheduled task deals with ensuring the cache is cleaned. This consists of destroying stale objects and arranging fresh objects are not using more memory than the configuration permits.

Memory usage is reduced within the cleaning task by discarding the source data for objects already held in the backing store (where the data can be retrieved relatively cheaply)

Additional memory may be recovered by simply discarding unused objects held only in memory. These objects are usually those which have only recently become unused otherwise the writeout task would have committed them to the backing store.

The most recently used objects are statistically likely to be used again soon, because of this there is a high risk of a cache miss associated with discarding these objects. In order to mitigate this undesirable effect the cleaning heuristic favours using more memory than configured for short periods rather than discarding from memory.

Backing store

The backing store is a simple key-value database. Each source data object and its metadata (its headers etc.) is associated with a unique key (the URL used to fetch it). The backing store is simply required to store and retrieve the object data associated with the given URL.

The backing store implementation in NetSurf is pluggable, this flexibility is required for dealing with the greatly varying capabilities and limits of the various systems the browser is required to execute on.

A novel aspect of this backing store interface is that if an implementation returns the wrong object it is not considered an error and is simply treated as a cache miss. This is possible because the metadata contains the URL of the stored object enabling object verification.

This behaviour is permitted because techniques which significantly improve key-value store performance (principally key hashing) become available if they are not required to always give the correct answer.

The backing store is also required to manage its size within the configured limits and deal with any filesystem behaviour details.

The reference backing store is trivial in that it performs a hash operation on the input URL, encodes the result with base64url encoding and uses that as the object filename. The length of the hash can be configured allowing use of the reference implementation in any situation where using the filesystem as a database is acceptable.

The reference backing store default hash is SHA-1 yielding almost unique [2] 160 bit key values stored in much the same way as the git DVCS. Note we are not using this hash for its cryptographic properties and at worst a collision will result in the cache being a tiny amount less efficient.

Conclusion

Hopefully this has shown what a browser source file cache is, why it is useful and a basic understanding of how they are implemented.

I will admit I have glossed over some of the more challenging aspects especially in relation to actually implementing the cache strategy but I hope that the reader will forgive those omissions of detail in a quest for a little more clarity of the general principles.

I would like to thank John-Mark Bell for implementing the NetSurf cache and to Melodie Parry for proof reading and providing feedback.

[1] Speed of light is 3x10^8m/s earth circumference is 4x10^7m, divide circumference by speed gives 133,000µs which gives longest round trip time from anywhere on surface of earth to point furthest from. So assuming a whole megabyte can be transferred in one round trip and allowing for some overhead the lower bound estimate of 150,000µs looks reasonable.

[2] Mathematically speaking this hash would allow us to say that with the current estimated 1x10^11 URLs on the internet the likelihood of a collision is still vanishingly small (at least 1x10^-15)

Tuesday, 7 January 2014

Healthy discontent is the prelude to progress

Back in the mists of Internet time (2002) when spirits were brave, the stakes were high, men were real men, women were real women and small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri there were few hosting options for a budding open source project.

Despite the meteoric rise of many dot com companies in the early noughties a project either had to go it alone by running everything themselves or pick from a small selection of companies that had not yet worked out how to turn free hosting into money.

One such company, arguably the first, was VA Research with their SourceForge system. Many projects used their platform including a small niche web browser called NetSurf. For years the service was great, there were rough edges but nothing awful.

Netsurf issue tracker in the new SourceForge interface
Over time the NetSurf projects requirements grew beyond what SourceForge could provide and service after service was migrated away eventually all that was left was the bug tracking system. This remained the state of affairs until mid 2013 when SourceForge forced a migration to their new platform which made them unsuitable for the projects use case.

Aside from the issue trackers questionable user interface SourceForge had started aggressively placing advertising throughout their platform. Some of the placements so inadvisable that projects started taking the decision to leave.

While I appreciate that SourceForge had to make money to provide a service they appear to have sown discontent within a large part of their user base without understanding that there are a number of alternatives solutions with a much less onerous funding model.

NetSurf used this as an opportunity to move the remaining issue tracking service to our own infrastructure. Rob Kendrick proceeded to evaluate several solutions and in December 2013 I finally found the time to migrate an XML dump of the old data from SourceForge into MantisBT.

Migrating data from one database into another via incompatible formats took me back to my roots. My early career started with programming tasks moving historical business data from ancient large systems which were about to be scrapped to modern Sparc based systems. Later I would be in a role where financial data needed to be retrieved from obsolete proprietary systems and moved into databases on x86 servers.

My experience in this field was not really stretched as it turns out that modern systems can process a few tens of megabytes in seconds rather than the days for a run in my youth! So some ugly perl scripts and a few hours later I had a nice shiny SQL database filled with NetSurfs bugs and MantisBT instance configured to use them.

NetSurf MantisBT instance showing most recently updated open bugs
Then came the hard bit, triaging all the open bugs, fixing up all the bugs submitted as an anonymous user but with email addresses in, removing duplicates and checking every open bug was still valid took almost two weeks of tedious drudge.

I set up an initial bug workflow within the system that the project developers are still fine tuning to better suit their needs but overall Mantis is proving a very flexible tool. The main deficiencies center around configuration for the projects useage, especially removing unused fields from filters and making the workflow more intuitive..

The resulting system is now getting bug reports submitted again where the sourceforge system had had three in the six months since the forced migration.

The issue tracker is once more a useful tool for the developers allowing us to focus on areas actually causing problems for our users and allowing us to see progress we are making fixing issues.

Overall this was a successful migration and provides a platform the NetSurf project can control where we can offer guarantees to our users about their personal information usage and having a clean rapid interface with no advertisements.