Sunday 13 March 2016

I changed my mind, Erase and rewind

My recent rack design turned out to simply not be practical. It did not hold all the SBC I needed it to and most troubling accessing connectors was impractical. I was forced to remove the enclosure from the rack and go back to piles of SBC on a shelf.

View of the acrylic being laser cut through the heavily tinted window
This sent me back to the beginning of the design process. The requirement for easy access to connectors had been compromised on in my first solution because I wanted a compact 1U size. This time I returned to my initial toast rack layout but retaining the SBC inside their clip cases.

By facing the connectors downwards and providing basic cable management the design should be much more practical.

My design process is to use the QCAD package to create layered 2D outlines which are then converted from DXF into toolpaths with Lasercut CAM software. The toolpaths are then uploaded to the laser cutter directly from the PC running Lasercut.

Assembled sub rack enclosureDespite the laser cutters being professional grade systems the Lasercut software is a continuous cause of issues for many users, it is the only closed source piece of software in the production process and it has a pretty poor user interface. On this occasion my main issue with it was my design was quite large at 700mm by 400mm which caused the software to crash repeatedly. I broke the design down into two halves and this allowed me to continue.

Once I defeated the software the design was laser cut from 3mm clear extruded acrylic. The assembled is secured with 72 off M3 nuts and bolts. The resulting construction is very strong and probably contains much more material than necessary.

One interesting thing I discovered is that in going from a 1U enclosure holding 5 units to a 2U design holding 11 units I had increased the final weight from 320g to 980g and when all 11 SBC are installed that goes up to a whopping 2300g. Fortunately this is within the mechanical capabilities of the material but it is the heaviest thing I have ever constructed from 3mm acrylic.

bolted into the rack and operatingOnce installed in the rack with all SBC inserted and connected this finally actually works and provides a practical solution. The self is finally clear of SBC and has enough space for all the other systems I need to accommodate for various projects.

As usual the design files are all freely available though I really cannot see anyone else needing to replicate this.

Tuesday 1 March 2016

Hope is tomorrow's veneer over today's disappointment.

Recently I have been very hopeful about the 96boards Hikey SBC and as Evan Esar predicted I have therefore been very disappointed. I was given a Hikey after a Linaro connect event some time ago by another developer who could not get the system working usefully and this is the tale of what followed.

The Standard Design

This board design was presented as Linaro creating a standard for the 64bit Single Board Computer (SBC) market. So I had expected that a project with such lofty goals would have considered many factors and provided at least as good a solution as the existing 32bit boards.

The lamentable hubris of creating a completely new form factor unfortunately sets a pattern for the whole enterprise. Given the aim towards "makers" I would have accepted that the system would not be a ATX PC size motherboard, but mini/micro/nano and pico ITX have been available for several years.

If opting for a smaller "credit card" form factor why not use one of the common ones that have already been defined by systems such as the Raspberry Pi B+? Instead now every 96Board requires special cases and different expansion boards.

Not content with defining their own form factor the design also uses a 8-18V supply, this is the only SBC I own that is not fed from a 5V supply. I understand that a system might require more current than a micro USB connector can provide, but for example the Banana Pi manages with a DC barrel jack readily capable of delivering 25W which would seem more than enough

The new form factor forced the I/O connectors to be placed differently to other SBC, given the opportunity to concentrate all connectors on one edge (like ATX designs) and avoid the issues where two or three sides are used. The 96board design instead puts connectors on two edges removing any possible benefit this might have given.

The actual I/O connectors specified are rather strange. There is a mandate for HDMI removing the possibility of future display technology changes. The odd USB arrangement of two single sockets instead of a stacked seems to be an attempt to keep height down but the expansion headers and CPU heatsink mean this is largely moot.

The biggest issue though is mandating WIFI but not Ethernet (even as an option), everything else in the design I could logically understand but this makes no sense. It means the design is not useful for many applications without adding USB devices.

Expansion is presented as a 2mm pitch DIL socket for "low speed" signals and a high density connector for "high speed" signals. The positioning and arrangement of these connectors proffered an opportunity to improve upon previous SBC designs which was not taken. The use of 2mm pitch and low voltage signals instead of the more traditional 2.54mm pitch 3.3v signals means that most maker type applications will need adapting from the popular Raspberry Pi and Arduino style designs.

In summary the design appears to have been a Linaro project to favour one of their members which took a Hisilicon Android phone reference design and put it onto a board with no actual thought beyond getting it done. Then afterwards attempted to turn that into a specification, this has simply not worked as an approach.

My personal opinion is that this specification is fatally flawed and, is a direct cause of, the bizarre situation where the "consumer" specification exists alongside the "enterprise" edition which itself has an option of microATX form factor anyhow!

The Implementation

If we ignore the specification appearing to be nothing more than a codification of the original HiKey design we can look at the HiKey as an implementation.

Initially the board required modifying to add headers to attach a USB to 1.8V LVTTL serial adaptor on the UART0 serial port. Once Andy Simpkins had made this change for me I was able to work through the instructions and attempt to install a bootloader and OS image.

The initial software was essentially HiSilicon vendor code using the Android fastboot system to configure booting. There was no source and the Ubuntu OS images were an obvious afterthought to the Android images. Just getting these images installed required a great deal of effort, repetition and debugging. It was such a dreadful experience this signalled the commencement one of the repeated hiatuses throughout this project, the allure of 64 bit ARM computing has its limits even for me.

When I returned to the project I attempted to use the system from the on-board eMMC but the pre-built binary only kernel and OS image were very limited. Building a replacement kernel , or even modules for the existing one proved fruitless and the system was dreadfully unstable.

I wanted to use the system as a builder for some Open Source projects but the system instability ruled this out. I considered attempting to use virtualisation which would also give better system isolation for builder systems. By using KVM running a modern host kernel and OS as a guest this would also avoid issues with the host systems limitations. At which point I discovered the system had no virtualisation enabled apparently because the bootloader lacked support.

In addition to these software issues there were hardware problems, despite forcing the use of USB for all additional connectivity the USB implementation was dreadful. For a start all USB peripherals have to run at the same speed! One cannot mix full (12Mbit) and high speed (480Mbit) devices which makes adding a USB Ethernet and SATA device challenging when you cannot use a keyboard.

And because I needed more challenges only one of the USB root hubs was functional. In effect this made the console serial port critical as it was the only reliable way to reconfigure the system without a keyboard or network link (and WIFI was not reliable either)

After another long pause in proceedings I decided that I should house all the components together and that perhaps being out on my bench might be the cause of some instability. I purchased a powered Amazon basics USB 2 hub, an Ethernet adaptor and a USB 2 SATA interface in the hope of accessing some larger mass storage.

The USB hub power supply was 12V DC which matched the Hikey requirements so I worked out I could use a single 4A capable supply and run a 3.5inch SATA hard drive too. I designed a laser cut enclosure and mounted all the components. As it turned out I only had a 2.5inch hard drive handy so the enclosure is a little over size. If I were redoing this design I would attempt to make it fit in 1U of height and be mountable in a 19inch rack instead it is an 83mm high (under 2U) box.

A new software release had also become available which purported to use an UEFI bootloader after struggling to install this version unsuccessfully, made somewhat more problematic by the undocumented change from UART0 (unpopulated header) to UART3 on the low speed 2mm pitch header. The system seemed to start the kernel which panicked and hung either booting from eMMC or SD card. Once again the project went on hold after spending tens of hours trying to make progress.

Third time's a charm

As the year rolled to a close I once again was persuaded to look at the hikey, I followed the much improved instructions and installed the shiny new November software release which appears to have been made for the re-release of the Hikey through LeMaker. This time I obtained a Debian "jessie" system that booted from the eMMC.

Having a booted system meant I could finally try and use it. I had decided to try and use the system to host virtual machines used as builders within the NetSurf CI system.

The basic OS uses a mixture of normal Debian packages with some replacements from Linaro repositories. I would have prefered to see more use of Debain packages even if they were from the backports repositories but on the positive side it is good to see the use of Debian instead of Ubuntu.

The kernel is a heavily patched 3.18 built in a predominantly monolithic (without modules) manner with the usual exceptions such as the mali and wifi drivers (both of which appear to be binary blobs). The use of a non-mainline capable SoC means the standard generic distribution kernels cannot be used and unless Linaro choose to distribute a kernel with the feature built in it is necessary to compile your own from sources.

The default install has a linaro user which I renamed to my user and ensured all the ssh keys and passwords on the system were changed. This is an important step when using this pre-supplied images as often a booted system is identical to every other copy.

To access mass storage my only option was via USB, indeed to add any additional expansion that is the only choice. The first issue here is that the USB host support is compiled in so when the host ports are initialised it is not possible to select a speed other than 12MBit. The speed is changed to 480Mbit by using a user space application found in the users home directory (why this is not a tool provided by a package and held in sbin I do not know).

When the usb_speed tool is run there is a chance that the previously enumerated devices will be rescanned and what was /dev/sda has become /dev/sdb if this happens there is a high probability that the system must be rebooted to prevent random crashes due to the "zombie" device.

Because the speed change operation is unreliable it cannot be reliably placed in the boot sequence so this must be executed by hand on each boot to get access to the mass storage.

NetSurf project already uses a x86_64 virtual host system which runs an LLVM physical volume from which we allocate logical volumes for each VM. I initially hoped to do this with the hikey but as soon as I tried to use the logical volume with a VM the system locked up with nothing shown on console. I did not really try very hard to discover why and instead simply used files on disc for virtual drives which seemed to work.

To provide reliable network access I used a USB attached Ethernet device, this like the mass storage suffered from unreliable enumeration and for similar reasons could not be automated requiring manually using the serial console to start the system.

Once the system was started I needed to install the guest VM. I had hoped I might be able to install locally from Debian install media as I do for x86 using the libvirt tools. After a great deal of trial and error I finally was forced to abandon this approach when I discovered the Linaro kernel is lacking iso9660 support so installing from standard media was not possible.

Instead I used the instructions provided by Leif Lindholm to create a virtual machine image on my PC and copied the result across. These instructions are great except I used version 2.5 of Qemu instead of 2.2 which had no negative effect. I also installed the Debian backports for Jessie to get an up to date 4.3 kernel.

After copying the image to the Hikey I started it by hand from the command line as a four core virtual machine and was successfully able to log in. The guest would operate for up to a day before stopping with output such as

$
Message from syslogd@ciworker13 at Jan 29 07:45:28 ...
 kernel:[68903.702501] BUG: soft lockup - CPU#0 stuck for 27s! [mv:24089]

Message from syslogd@ciworker13 at Jan 29 07:45:28 ...

 kernel:[68976.958028] BUG: soft lockup - CPU#2 stuck for 74s! [swapper/2:0]

Message from syslogd@ciworker13 at Jan 29 07:47:39 ...
 kernel:[69103.199724] BUG: soft lockup - CPU#3 stuck for 99s! [swapper/3:0]

Message from syslogd@ciworker13 at Jan 29 07:53:21 ...
 kernel:[69140.321145] BUG: soft lockup - CPU#3 stuck for 30s! [rs:main Q:Reg:505]

Message from syslogd@ciworker13 at Jan 29 07:53:21 ...
 kernel:[69192.880804] BUG: soft lockup - CPU#0 stuck for 21s! [jbd2/vda3-8:107]

Message from syslogd@ciworker13 at Jan 29 07:53:21 ...
 kernel:[69444.805235] BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]

Message from syslogd@ciworker13 at Jan 29 07:55:21 ...
 kernel:[69570.177600] BUG: soft lockup - CPU#1 stuck for 112s! [systemd:1]

Timeout, server 192.168.7.211 not responding.

After this output the host system would not respond and had to be power cycled never mind the guest!

Once I changed to single core operation the system would run for some time until the host suffered from the dreaded kernel OOM killer. I was at a loss as to why the oom killer was running as the VM was only allocated half the physical memory (512MB) allowing the host what I presumed to be an adequate amount.

By adding a 512MB swapfile the system was able to push the few hundred kilobytes it wanted to swap and the system was now stable! The swapfile of course has to be started by hand as the external storage is unreliable and unavailable at boot.

I converted the qemu command line to a libvirt config using the virsh tool
virsh domxml-from-native qemu-argv cmdln.args

The converted configuration required manual editing to get a working system but now I have a libvirt based VM guest I can control along with all my other VM using the virt-manager tool.

This system is now stable and has been in production use for a month at time of writing. The one guest VM is a single core 512MB aarch64 system which takes over 1100 seconds (19 minutes) to do what a Banana Pi 2 dual core 1GB memory 32bit native ARM system manages in 300 seconds.

It seems the single core limited memory system with USB SATA attached storage is very, very slow.

I briefly attempted to run the CI system job natively within the host system but within minutes it crashed hard and required a power cycle to retrieve, it had also broken the UEFI boot. I must thank Leif for walking me through recovering the system otherwise I would have needed to start over.

Conclusions

I must stress these conclusions and observations are my own and do not represent anyone else.

My main conclusions are:

  • My experience is of a poorly conceived, designed and implemented product rushed to market before it was ready.
  • It was the first announced 64bit ARM single board computer to market but that lead was squandered with issues around availability, reliability and software.
  • Value for money appears poor. Product is £70 plus an additional £50 for USB hubs, power supplies, USB Ethernet and USB SATA. Other comparable SBC are around the £30 mark and require fewer extras.
  • The limited I/O within the core product yields a heavy reliance on USB
  • The USB system is poorly implemented resulting in large additional administrative burdens.
  • Limited memory of 1Gigabyte reduces the scope for making use of the benefits of 64bit processors.
  • Recent change to UEFI bootloader is very welcome and something I would like to see across all aarch64 platforms. This is the time for there to be a single unified boot method, having to deal with multiple bad copies of bootloaders in the 32bit world was painful, perhaps this mistake can be avoided in the future.
  • The kernel provision is abysmal and nothing like the quality I would expect from Linaro. The non-upstream kernel combined with missing features after almost a year of development is inexplicable.
  • The Pine64 with 2G of memory and the Raspberry Pi 3 with its de-facto standard form factor are both preferable despite their limitations in other areas.
This project has taken almost a year to get to the final state and has been one of the least enjoyable within that time. The only reason I have a running system at the end of it is sheer bloody mindedness because after spending hundreds of hours of my free time I was not prepared to see it all go to waste.

To be fair, the road I travelled is now much smoother and if the application is suited to having a mobile phone without a screen then the Hikey probably works as a solution. For me, however, the Hikey product with the current hardware and software limitations is not something I would recommend in preference to other options.