Showing posts with label Arch. Show all posts
Showing posts with label Arch. Show all posts

2020-05-06

Beware of ahci module change from dynamic to built-in on Artix (Arch) Linux


Recently, after one of the usual system updates, I suddenly ended up with an unbootable system on my Artix based NAS server. Since the system's boot process is not the most stable in general, initially, I thought it was yet another "moody" day for the system caused by the Marvell controller on ADPE4S-PB daughterboard. However, the boot is usually successful in several attempts in this case. If not, rarely this was caused by failed initramfs generation during the update process. In such situations, I was regenerating it by booting into Artix (or previously Arch) installation system and chrooting into the main system (or using fallback initramfs boot option). This procedure requires reconnecting the system drive to VIA based controller. Nevertheless, nothing was helping to bring it back to life. I started to suspect that I was dealing with the new issue this time. Initial speculation was geared to failing old hardware, but in the end it appeared to be a software based issue. Since then, I successfully booted into the system using the VIA integrated controller, it was getting obvious that modules configuration was not being applied for some reason. I confirmed that by looking at the lspci output which didn't show the AHCI driver being applied for the Marvell controller.


A little investigation revealed that the new kernel has ahci.ko.xz and libahci.ko.xz module files missing in /usr/lib/modules//kernel/drives/ata directory. They were present there before the upgrade. Since I depend on the AHCI driver specific property, the reason of the failure was pretty clear. Despite that, I didn't know yet why those modules had been missing. Regardless, I was looking for the fastest way to restore my system first. The first solution I came up with is to revert to the previous kernel. It appeared to be possible due to the fact that pacman package manager is keeping older packages in pacman cache. Using command "pacman -U /var/cache/pacman/pkg/linux-5.5.10.artix1-1-x86_64.pkg.tar.xz" downgraded kernel and, fortunately, the system was bootable again.

The downgrade solution was supposed to be temporary, since I can't ignore upgrades forever. Initially, I assumed that the missing modules were a mistake so I filed a bug report. However, it was soon closed with explanation that ahci modules are built-in in the kernel and it's not a bug. I believe this change happened starting 5.6 kernel series since 5.5 based kernels were still working for me. Because of this, I made an attempt to apply corresponding configuration of dynamic modules on kernel. Thankfully to the good Arch Linux online documentation, it was easy to find required information. This page describes how to pass module parameters to the kernel and this one describes the GRUB bootloader configuration. To pass module parameter "module.param_name=param_value" needs to be passed to the kernel line. Blacklisting is performed by passing "module_blacklist=module_name" parameter. In my specific case, I needed to pass marvell_enable=1 property value to ahci driver and blacklist pata_marvell module. So, these steps needed to be performed in my case:
  • sudo vi /etc/default/grub
    • change GRUB_CMDLINE_LINUX line to:
    • GRUB_CMDLINE_LINUX="ahci.marvell_enable=1 module_blacklist=pata_marvell"
  • regenerate grub.cfg by running sudo grub-mkconfig -o /boot/grub/grub.cfg
It should be safe to keep both: grub configuration and previous configuration for dynamic modules. They should not interfere with each other and would be ignored depending on using either built-in or dynamic module.

Unfortunately, this solution didn't work as well as expected. Though I was able to boot into the system, the success ratio decreased to an unbearable level. Only 1 out of 5 to 7 attempts were partially successful. By that, I mean the system booted and I could interact with it, however, none of the attempts initialized all the hard drives correctly. It was either system one or one of other two hard drives in LVM RAID failing. Thus, RAID volume wasn't mounted at best, or system was failing to boot at worst. Quite often, the system disk was still not recognized early in the boot process leading to rescue shell. Because of that, I was forced to revert to the old kernel again.

Considering that I didn't manage to fix the issue, I am not sure if there is a workable and stable solution at this point. Manually built kernel with modularized ahci module may help, but it would mean that I would need to track kernel upgrades myself. Moreover, building Linux kernel is not a very trivial process as I wish it should be, so it is not a viable option for me. As a temporary solution I can completely disable Linux kernel upgrades and keep the other software up-to-date. However, as a long term solution, it may force me to look for another distribution which still uses AHCI as module or even to consider a complete hardware update. Only time will tell which one will be easier to implement.

In conclusion, if Arch Linux based system is used along with Marvell 88SE6145 SATA controller (or any other marvell controller which requires ahci module specific configuration), I currently advice to refrain from upgrading to 5.6.x kernel. One can try to experiment with kernel parameters as described above, however, it is advisable to make a backup image of the system beforehand, so it can be easily restored. The downgrade path may not always be successful because of other dependencies and should not be relied on.

2019-11-01

Migration from Arch Linux to Artix Linux

Recently I reached the point where 3TB hard drivers were getting almost full and I needed an upgrade. As a result, I opted for two Toshiba MG04ACA600E 6TB hard drives which replaced Toshiba DT01ACA300 and Seagate Barracuda ST3000DM001 3TB models. They were successfully recognized by the Jetway JNF-76 motherboard, on which my NAS server is based on. As a matter of fact, VIA AHCI/RAID firmware always properly displays them in SATA II mode (max speed for all VIA controllers AFAIK), which wasn't the case with the previous setup when one of the drives randomly would drop to SATA I. In general, there is nothing to complain about them, hopefully they will be reliable too.

Toshiba MG04ACA600E

This upgrade and few unexpected free days made me rethink both software and hardware configurations again and come back to the original question of the Linux distribution to be used. Probably one additional kick was the news about Project Trident decision to move from FreeBSD-based TrueOS to Void Linux, which made me revisit some light-weight distributions, init systems and file systems. I was even considering to try NetBSD with OpenZFS but quite soon this plan faded away due to lack of experience on managing OpenZFS filesystem. Initial testing showed that I can easily destroy my data and before such a move, I should be really confident with it. Even though I was very much satisfied with Arch Linux setup too, I was itching to try some systemd-free distribution instead, which gives me more natural and familiar way of work. The relatively recent consolidation of Arch and Manjaro OpenRC spinoffs into Artix Linux caught my major attention for the obvious reasons. It is still based on Arch, which provides a more familiar environment, spirit and tools. The distribution officially supports OpenRC and runit init systems. I chose OpenRC one, created migration plan, and started the journey.

First boot

To make sure that it boots and recognizes my hardware, I downloaded the latest artix-base-openrc-20191009-x86_64.iso image which I could find at the time of migration in one of their mirrors. System booted without issues and recognized all the required hardware, pata_via module probably was compiled into the kernel since I didn't need to configure it separately, contrary to the Arch Linux. Setup included JFS and XFS mounting option support which was required for my data disks. It reassured me to go forward since I didn't find any potential setbacks.

The plan

To minimize a potential regression impact, I decided to prepare a transition plan. It wasn't very detailed, since it did not describe all the specific commands but it pointed out general actions I would need to perform:
  • Backup /home and /etc/fstab
  • Backup samba, crontab, modules and network settings.
  • Boot into Artix setup and backup Arch system disk (it has dd included).
  • Install Artix itself
  • Install/configure required software and services: samba, wget, rsync, git, fossil, jfsutils, xfsprogs, rtorrent, lm_sensors, hdparm, hdtemp, ethtool, vim, vi, man, openssh.
  • Restore /home, crontab
  • Configure network
  • Configure mounting points to data drives
  • Install/configure powerd

Migration

The initial step was to prepare backup for all settings, /home folder and Arch Linux system disk itself. Individual settings files were simply copied manually one by one, and /home folder was synced using rsync utility. For the Linux system disk backup I booted into Artix setup and used dd utility together with gzip compression utility to copy all the system disk (dd if=/dev/sda conv=sync,noerror bs=64K | gzip -c  > /mnt/nasbackup.img.gz). This step probably took the longest time wise, preparing compressed image of 240GB SSD drive took way longer than I was expecting (hours). Probably it was related to limited performance of the system itself.

Once all backup were ready, I moved on to the installation part. The Artix installation process was based on the steps from distribution's wiki Installation and Migration sections so I won't go much into details. It was almost smooth process except that the base installation didn't have any editor installed, and this led me to install Vim editor first before I could do any required configuration (right after chroot command). Probably the only mistake by following the documentation was the NetworkManager installation where I should have preferred Gentoo's netifrc modules with dhcpcd utility instead.

Rebooting into the newly installed system gave few more surprises on how minimal the setup was. Even quite basic applications like hostname or man were not available. Though it was easily solved by installing inetutils (for hostname) and man packages using pacman package manager, however, it still felt a bit strange, since I believe Arch base install had more software. The list of my installed packages includes:

sudo pacman -S wget rsync git fossil jfsutils xfsprogs rtorrent lm_sensors hdparm hdparm-openrc hddtemp ethtool vim vi man openssh openssh-openrc samba samba-openrc cronie cronie-openrc powerd powerd-openrc inetutils

Besides that I also installed grub, os-prober, networkmanager and networkmanager-openrc during the initial installation process. However, as mentiond above I may replace NetworkManager with netifrc modules and dhcpcd utility in the future.

As you may have noticed, all services have -openrc counterpart for OpenRC service management. OpenRC service files are installed in /etc/init.d for execution and /etc/conf.d for configuration. After that service needs to be added to a certain execution level (I used default for all) using rc-update add command and they can be started by rc-service start command. Services I have added include:

sudo rc-update add NetworkManager default
sudo rc-update add sshd default
sudo rc-update add smb default
sudo rc-update add cronie default
sudo rc-update add cpupower default 
sudo rc-update add hdparm default

The next step was to configure the network aggregation. Since I installed NetworkManager, I needed to use nmcli utility to setup bonding network interface (ensure that all interfaces are down and leases are deleted from /var/lib/NetworkManager/*.lease before proceeding):

sudo nmcli con add type bond ifname bond0 bond.options "mode=802.3ad"
sudo nmcli con add type ethernet ifname enp4s4 master bond0 
sudo nmcli con add type ethernet ifname enp4s6 master bond0
sudo
nmcli con up bond-slave-enp4s4
 
sudo nmcli con up bond-slave-enp4s6 
sudo rc

With dhcpcd utility it would have been a slightly different story (ensure that all interfaces are down and all leases are deleted from /var/lib/dhcpcd/*.lease before proceeding):

sudo pacman -S dhcpcd dhcpcd-openrc
sudo rc-update add dhcpcd
sudo ln -s net.lo net.bond0
vim /etc/conf.d/net
    config_enp4s4="null"
    config_enp4s6="null"
    slaves_bond0="config_enp4s4 config_enp4s6"
    config_bond0="dhcp"
    mode_bond0="802.3ad"
sudo rc

Finally, I completed my setup by restoring my home folder content, crontab and samba configuration files. Migration ended up to be easy and problem-free. With backup time excluded and ignoring some confusion during network configuration, it probably didn't take me more than few hours of work.

There are some caveats though which I noticed but I am clueless as to the cause and if it actually affects the system in any way:
  • Multiple segfault messages in dmesg (not sure yet, what causes these messages but it doesn't seem to affect any of the required functionality as of result): openrc-run.sh[2008]: segfault at 8 ip 00007fdfbfb890dc sp 00007ffd4de79040 error 4 in libc-2.30.so[7fdfbfb37000+14d000]
  • Warning message for bond0: "Warning: No 802.3ad response from the link partner for any adapters in the bond". This may indicate the router issue or premature warning message, but apparently the bonding interface works correctly (and router indicates it as active).
In conclusion, I can say that the OpenRC init system actually feels more intuitive and simple than systemd, but the maturity of Artix distribution is still not on par with systemd-based Arch Linux. Errors in boot messages, inconsistencies and confusing information in wiki page and even the web page itself are the signs of that. Nevertheless, for my simple NAS server requirements this distribution seems like a perfect fit and I believe I will stick with it unless stability-wise it won't prove itself. Arch Linux was very stable and polished in this regard despite its rolling release nature, I expect no less from Artix as well. If anybody would ask me for a recommendation, I would still recommend using the Arch Linux distribution, unless, that person would prefer using the systemd-free distribution for whatever reason (in that case Gentoo and Void would be a primary choice I guess). Yet, I believe Artix has a potential to become one of the strongest light-weight distributions, especially for those seeking simplicity and freedom of choice.

Planned hardware changes

Though the migration went successfully, I still have some possible hardware changes in mind. First of all, VIA based PCI controller is not compatible with the SSD I am using, which triggers a workaround in the driver to reduce transfer rate to ~60 MB/s:

ata3: Incompatible drive: enabling workaround. This slows down transfer rate to ~60 MB/s

Secondly, I am planning to reuse replaced hard drives as yet additional backup solution in RAID0 setup (which in turn will make them 6TB in total).

Because of these changes, I am considering to revert to using ADPE4S-PB SATA II daughterboard. It means checking its compatibility with Artix (hopefully the same or a similar configuration to Arch Linux will work) but I fear about the boot process stability which improved consideribly once I switched to Delock controller. However, the daughterboard can provide enough SATA interfaces for the system drive and two hard drives without hindering too much to the overall system performance (it is based on faster PCI-E interface and it supports SATA II 3 Gbit/s transfer speed). In addition to that, I will need some creativity on attaching those two additional drives, since I don't have space in mini-itx case anymore. Probably I will need to keep them outside by using some extentions to reach the hard drive SATA connectors. The main concern is that the SATA power connector is too short to go outside the case. I hold my breath on Molex to SATA adapters. For keeping them outside, I will try to reuse my old external 3.5'' HDD cases. We will see how this goes...

To keep the network aggregation I also bought a cheap D-Link DGE-528T (Rev. C1) network card with high expectations that it will successfully work with the daughterboard contrary to Intel PRO/1000 GT card, since it should not have any interfering firmware. It is based on the Realtek RTL8169SC controller, which I would pair up with the integrated Realtek RTL8111C controller. Performance-wise, I do expect some degradation compared to AD3INLANG-LF daughterboard, but this is acceptable.

D-Link DGE-528T
I am planning to write a new article if this project goes successfully.

2018-12-10

Network aggregation project on Jetway JNF76 motherboard


Prehistory

Since my NAS revival project last year, I already had in mind to try the link aggregation feature, which was supported by my router. Initially I expected simply to pair up integrated Realtek RTL8111C NIC controller with Intel PRO/1000 GT PCI adapter as I already had one in the pile of my parts. For this purpose I even bought an Akiwa GHP-PCI106H PCI riser card (I use their GHB-B05 case) which wasn't so trivial to find - it was available in several industrial PC online shops only and, because they communicate exclusively with the juridical entities I needed to go through an intermediate party to buy it. When everything was ready, I believed that it would be a relatively straight forward setup. However, it ended up in a quite longer story than expected.

Akiwa GHP-PCI106H PCI riser card

Initial plan failed

The aforementioned configuration failed immediately after I attached the PRO/1000 GT adapter. I hit an old JNF76 motherboard BIOS issue that expansion cards with their own firmware may clash with ADPE4S-PB SATA II daughterboard, rendering the system completely unbootable. Intel's card had one with PXE boot support. This limitation have never been fixed by Jetway, thus I have no knowledge of any solution to resolve the situation. Unfortunately, it was clear that this configuration was a no go.
Intel PRO/1000 GT PCI network adapter

Looking for new solutions

One hiccup didn't deter me from completing the project though. It pushed me to look for other possible options to go around the limitation. Most likely the simplest way would have been to find another NIC adapter which wouldn't clash with the daughterboard. That meant buying the card without PXE boot support. As a proof of that, I have successfully tested an old VIA based Fast Ethernet (100Mbps) PCI network adapter. Indeed, it didn't clash with ADPE4S-PB on BIOS POST process and it was perfectly working in the Linux environment. Although this option looked quite appealing initially, there were few Realtek based GbE adapters in the market, however their description tended to be vague at times and I felt that I may have ended up buying yet another incompatible card. More than that, the whole link aggregation idea was also a bit risky and it might have left me with the second GbE network adapter buried deep in the pile of old parts. Because of all of this, it made me rethink of the NAS configuration a bit, which would solve another outstanding issue.
 
ADPE4S-PB SATA II controller

Reasoning for ADPE4S-PB replacement

The main reason for the configuration change was the same ADPE4S-PB daughterboard. Besides those clashes with expansion cards and a bit of challenging configuration, I was also experiencing another very annoying issue with it. I used this controller to attach SSD with the host OS only. Relatively soon after I made it work and boot the system, I started to face unstable behavior during the Linux boot process. From time to time Arch Linux was failing to initialize the SSD, forcing me to the reset loop till the successful attempt. Few times I was spending up to an hour on clicking reset button... I haven't figured out the actual reason for this, but it could have been everything from the driver to the bad cable connection (my SSD's SATA plug is physically damaged) or improperly attached daughterboard. It led me to the decision to replace ADPE4S-PB board with the LAN expansion daughterboard instead. I chose an AD3INLANG-LF module which hosts three Intel 82541PI based network controllers, the same used by PRO/1000 GT adapter. It was readily available through amazon but I required some help from a friend in the US to deliver it to Europe. For an SSD drive I decided to use my old Delock PCI USB/SATA combo card.

AD3INLANG-LF daughterboard

Inserting AD3INLANG-LF daughterboard

Since I attached the SATA daughterboard many years ago, I already forgot how actually tricky is the process of inserting it to proprietary JWDB header. For some weird reason Jetway doesn't provide the manual and you can find that at least a few people are complaining about the attaching process in product reviews, blog items or forums, some taking even quite extreme measures to make it seat properly. If it's not, the daughterboard either will fail to work completely or it can even light up (for example, green light on LAN port if cable is attached) but non of the interfaces will be identified or initialized in the OS or BIOS. Nevertheless, the actual process shouldn't require much force, using the trial-error approach, I have managed to insert it properly by aligning inner pins (means the ones closer to CPU side) with the plug holes on the daughterboard in a 30-45 degree angle and sliding it in a circular motion and a slight force into the second row. You can actually feel a small resistance when it goes into the header. Once properly seated, you can even try to lift it up gently, it shouldn't move out of place or fall to any side from its own weight. Also, you can check pins under the daughterboard: they should be evenly aligned on both sides and almost touching the header's plastic.
JWDB connector on ADPE4S-PB daughterboard





Finally, you can enter BIOS setup and check if three new boot ROM options for LAN interfaces (Addon Intel Lan1 to Lan3) appeared in the Integrated Peripherals -> Onboard Device Function section (not sure if it appears for Realtek based daughterboard as well). In case of SATA daughterboard, it would show it's own BIOS screen for the RAID setup.
New entries in BIOS after inserting AD3INLANG-LF daughterboard

Delock PCI USB/SATA combo card

89140 Delock PCI card combo USB2.0/eSATA/ATA has 1xSATA, 1xESATA, 1xIDE and 4xUSB 2.0 ports. SATA/IDE is based on VIA VT6421A SATA RAID controller and USB 2.0 is managed by VIA Vectro VT6214L USB host controller. Similar to the Marvell controller on ADPE4S-PB daugtherboard, VIA VT6421A doesn't have a built-in support in Arch Linux and requires to load additional kernel module. Some resources were mentioning satavia name, however Arch Linux uses sata_via with the underscore. So I modified the /etc/mkinitcpio.conf file by adding sata_via module to MODULES=(sata_via) line and regenerated initramfs (mkinitcpio -g /boot/initramfs-linux.img). After the reboot, the controller was successfully recognized and it was now possible to boot the OS from it. The USB controller was recognized without any additional configuration though, but it doesn't support loading OS from it (at least, not on JNF76 motherboard). Although, it may seem as a drastic downgrade to already subpar SATA II controller but an SSD performance is still significantly faster than any USB 2.0 flash drive I have used before for the same purpose (it reaches ~80MB/s for writing and reading from my testing). Most importantly I haven't faced SSD initialization issues with it which saves lots of headache on system reboot process. Unfortunately, I can't make a comparison to ADPE4S-PB as I have never done or recorded performance tests for it and I realized that after finishing my current setup only. As mentioned in the previous chapter, reattaching modules is quite a complicated and time-consuming process, so I am not keen to do it just for testing purposes.
Delock PCI USB/SATA combo card

AD3INLANG-LF speed test without link aggregation

As described previously AD3ILANG-LF daughterboard consists of three Intel 82451PI Ethernet controllers. It is a 32-bit 3.3V PCI 2.3 based controller, which supports 33MHz and 66 MHz speeds. Controllers were identified as GI version instead by Linux (the difference is the manufacturing stepping of the controller, where GI is manufactured in B1, PI in C0 stepping). lspci -v output for one of the controllers can be seen below:

04:04.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05)
      Subsystem: Intel Corporation PRO/1000 MT Network Connection
      Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 18
      Memory at dfac0000 (32-bit, non-prefetchable) [size=128K]
      Memory at dfaa0000 (32-bit, non-prefetchable) [size=128K]
      I/O ports at 9c00 [size=64]
      [virtual] Expansion ROM at dfa00000 [disabled] [size=128K]
      Capabilities: [dc] Power Management version 2
      Capabilities: [e4] PCI-X non-bridge device
      Kernel driver in use: e1000
      Kernel modules: e1000

While my external Intel PRO/1000GT card uses a 33 MHz interface, all three controllers in the daughterboard are attached to the 66 MHz PCI. Many years ago I thought that Jetway proprietary interface is just a modified conventional PCI but apparently it is bridged to PCI-E interface which allows to combine few PCI interfaces under one connection.

I tested the speed of copying one 1.6 GB size file through samba 4.8 (Arch Linux to Manjaro Linux). My main computer was using 10-Gbit Tehuti 4010 PCI-e x4 rev2 based controller (Edimax EN-9320SFP+) connected to the same router as the NAS server (using 10G SFP+ DAC cable). The hard drive in use was portable Transcend Storejet 500 SSD drive connected to USB 3.0 port (internal reading speed 6.9 GB/s, writing 2 GB/s). On the NAS side TOSHIBA DT01ACA300 3TB hard drive was attached to native VX800 SATA II controller without any RAID setup (reading ~137MB/s, writing ~98.5 MB/s). Both systems were using ext4 filesystem:

Integrated (Realtek) interface: 53.8 MB/s

Inner (left) interface: 80.7 MB/s
Middle interface: 82 MB/s
Outer (right) interface: 80 MB/s

The left and right interfaces of the daughterboard were showing quite a similar constant ~80MB/s speed, while the middle interface was minimally faster at around 82 MB/s. Integrated RTL8111C interface was copying file at 53.8 MB/s speed.

Edimax EN-9320SFP+

Link aggregation

Finally it was the time to setup link aggregation. I enabled the 802.3ad link aggregation feature on my router and was ready to start NAS configuration. However, once I thought that the worse times were behind, I stumbled onto bonding driver configuration issues which led me to try link aggregation on the NetBSD first.

NetBSD

Since I wanted to confirm that my Linux configuration struggles were not caused by the router, I checked if NetBSD supports link aggregation. It appeared that it does and the setup process is relatively easy. As I always have installed NetBSD system on the WD Elements Portable (10A8) USB hard drive (NetBSD 8.0 at the time of testing), I just plugged it in to the system's USB port and booted from it. Thankfully, the system loaded without any issues, but was just relatively slow because of USB 2.0 speed limitations. I was following the official manual page and this article to setup link aggregation. There is no need to recompile the kernel, as the default (GENERIC) amd64 kernel already has enabled agr pseudo-device in its configuration. I literally repeated all the steps from the man page: 
/etc/rc.d/dhcpcd onestop #stop the DHCP client
ifconfig wm0 inet xxx.xxx.xxx.xxx delete
ifconfig wm0 inet6 fe80::xxxx:xxxx:xxxx:xxxx delete
ifconfig wm1 inet xxx.xxx.xxx.xxx delete
ifconfig wm1 inet6 fe80::xxxx:xxxx:xxxx:xxxx delete
ifconfig agr0 create
ifconfig agr0 agrport wm0
ifconfig agr0 agrport wm1 
/etc/rc.d/dhcpcd onestart #start the DHCP client

Once dhcpcd (DHCP client) service started, to my surprise aggregated interface was correctly configured, the router immediately changed aggregation status to enabled. Network was working properly, thus, I could access both my router and the Internet. There was no point to test copying speeds, because USB 2.0 interface was the major bottleneck, however I successfully managed to copy a few files to my main computer using scp protocol. This proved that I was doing something wrong in the Linux environment and that link aggregation actually works between NAS and router.
ifconfig agr0 output

Arch Linux

Arch Linux has a great documentation on many topics in their wiki page. Bonding is not an exception as well. Basically, configuration is not complicated at all: just copy /etc/netctl/examples/bonding file to /etc/netctl/bond0 and configure aggregated interface according to the example. The only difference is to check how your interfaces are called (for example: ip addr list) and replace them in the BindsToInterfaces section accordingly. To be sure that I was configuring the right interface, initially I connected the cable to each of them one by one, and checked which one was configured by DHCP. One additional change was bonding mode. By default Linux uses the round-robin policy instead of the 802.3ad. Therefore, Mode=802.3ad line needs to be added as well. The final configuration looked like this:

 Description="A bonded interface"
 Interface=bond0
 Connection=bond
 BindsToInterfaces=(enp4s6 enp4s4)
 IP=dhcp
 Mode=802.3ad

After that I used netctl enable bond0, netctl start bond0 commands. From the first glance everything seemed fine: new bond0 interface actually appeared, it was up and correct IP was assigned to it. However, pinging to the router was failing and aggregation status was still disabled. It took me two evenings to realize that the problem was dependent on the DHCP client. This article was great in providing additional information on bonding and gave me a hint on troubleshooting the issue. The cause of the failing network was old lease files in /var/lib/dhcpcd folder for aggregated interfaces. Because of this dhcp client was configuring not only bond0 interface but each aggregated interface separately as well. It confused the network and it was failing to route through the right interface. It was enough to delete all *.lease and run dhcpcd client manually for bond0 interface. Since it was properly and automatically configuring bond0 interface even after reboot. Finally network aggegration was working as intended! It can be recognized by the same MAC address between bond0 interface and aggregated ones, IP address assigned only to bond0 interface and "master bond0" on aggregated interface description (ip addr list):

2: enp4s4: mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
3: enp4s6: mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
6: bond0: mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 brd 192.168.1.255 scope global noprefixroute bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::230:18ff:fec4:568c/64 scope link
       valid_lft forever preferred_lft forever

Link aggregation speed testing

I performed the same test for the aggregated interface as I did for each NIC controller separately by copying the same 1.6 GB file through samba. To my disappointment, network bonding actually added quite a big additional overhead and was way slower than each interface was performing independently. In general, copying speed was from 58 to 62.5 MB/s which is just about ~10-16% more than the integrated NIC controller and over 20% less than independently working 82541PI controller. Copying 2 files at the same time reduces the speed almost by half for each file so total speed is still basically the same (possibly it's already also a hard drive limitation). According to wikipedia, 802.3ad gives fault tolerance and load balancing, however it does that in expense of performance. If those features are more important than it's worth the hassle, however if speed matters possibly balance-tld or balance-alb modes should be investigated. Unfortunately I didn't have time to play with those modes yet as they need a bit of a different setup (router's link aggregation needs to be actually disabled). If I will do though, I will create a shorter article on that as well.
LEDs of aggregated NICs flashing at the same time

Conclusion

I started this project in my blind belief that bonding will give me bigger speeds, thus I didn't spend enough time on research. Like RAID setup, network aggregation also has different modes for different purposes. The 802.3ad standard is currently the one widely supported by routers. However, it is actually slower than individual interface and provides fault tolerance and load balancing instead, which may or may not be important for your setup. If you are looking to increase your transfer speeds, probably balance-tld mode should be tested. But even in this case, you shouldn't forget that most of the switches, computers and routers are still limited to 1Gbit Ethernet, so even doubling the speed can't actually saturate doubled speed link. Finally, if you are using a magnetic hard drive, they can be a bottleneck as well. Before network bonding setup, you could also possibly consider RAID1+0 setup. In general, the link aggregation feature is not for everyone and you could investigate if it's worth your time, money and hassle. For me, it was mainly an interesting experience and the good reason to fix my outstanding NAS issues. This journey wasn't easy but it gave me valuable information for future configurations and eased me from previous headaches I have suffered from.

2018-01-06

Linux on Jetway JNF76-N1G-LF in 2017

Back in 2009 I bought the Jetway JNF76 motherboard including ADPE4S-PB daughterboard (4xSATA II marvell controller) for their proprietary connection (Jetway Daughter Board Connector Gen 1). The motherboard was equipped with VIA Nano U2300 1GHz CPU and VX800 chipset. Eventually it replaced another board VIA VT-310DP as my main NAS (network attached storage) server. During that time it took me quite a while to find a properly working setup. I tried all BSDs and some Linux distributions, until I was forced to switch from my initial NetBSD 5.0 setup to Debian Linux 5. Refer to my previous articles by clicking here, here and here. Finally some outstanding issues were resolved during my transition to Debian 6 in 2011. Debian Linux setup worked for around 5 years in my server till my OS installation containing flash drive died. Few weeks ago I decided to revive the system but using a different lighter Linux distribution. The folowing points I listed were the issues I encountered during the time I was using Debian.
  • Debian Linux required additional configuration for drives connected to ADPE4S-PB controller to be recognized (adding ahci.marvell_enable=1 to some file in /etc/modprobe.d/). I was wondering if it is still the case today.
  • A06.1 BIOS version wasn't compatible with any Linux distribution I've tried at that time (boot process was hanging up before any boot messages were available on the screen), only falling back to A05 version was solving the issue.
  • IDE mode was needed to be used for native VX800 SATA controller for ADPE4S-PB daughterboard to initialize during PC boot process (BIOS issue).
  • Graphical install was working during that time.
During this endeavor I have tried to install the following Linux distributions:
I also tested live images of these distributions:

A06.1 Linux issue 

The board reached EOL long ago however I was still expecting some BIOS upgrades after A06.1 version in 2010. Since then I was communicating with Jetway extensively regarding Linux boot issue (I've tried around 5 major distributions at that time, all of them failed to boot), however their answers didn't help me reach any conclusion. Eventually I just downgraded BIOS back to A05. Unfortunately, visiting their current page showed that A06.1 version is the last upgrade for this board, and no more upgrades can be expected. Out of curiosity I updated BIOS again and I can confirm that during the time I was writing this article, Linux boots unless no USB devices are attached, otherwise system crashes with "Fatal exception in interrupt" message. If you attach USB device after system is loaded, it will work but Linux still can crash at any given with the same error (in my case during system upgrade process). It is  improvement over immediate crash without boot log messages but hardly justifies BIOS upgrade. On the other hand, BIOS release notes stated that it fixed the 2TB hard drive detection issue however I didn't have any problems with >2TB drives on A05 BIOS anyway (over the time I used at least three or four different models from Hitachi, Seagate, Toshiba and possibly WD). I would personally recommend to upgrade the latest BIOS version only if necessary.

LXLE and SparkyLinux

My initial goal was to use CRUX distribution for the NAS server. It didn't have a live image and its installation image was pretty primitive, thus it lead me to search for a light Linux distribution which had live image for initial testing. SparkyLinux and LXLE were the ones which caught my attention because they both used light window managers by default (OpenBox, LXDE). The former is based on testing Debian branch, while the latter on Ubuntu. Unfortunately, it appeared that graphics-based installation failed on both, which meant LXLE was not useful for my needs. Therefore, since SparkyLinux had a console-based boot support then it proved to be a quite good "rescue" type distribution. It is important to note that hard drives attached to ADPE4S-PB daughterboard weren't recognized on SparkyLinux otherwise it worked OK on the system and provided enough tools for my testing needs.

CRUX Linux

I was really hoping to finally grasp this quite unique Linux distribution which uses BSD style unit scripts and ports style package management. Why? Mainly because among all other Linux distributions CRUX has the features closest to BSD since I am more a BSD person than a Linux one. 

Initially setup image boot failed which was caused by an aspm (Active-State Power Management for PCI-E) issue. The solution was to add pcie_aspm=off parameter on every boot (just write CRUX pcie_aspm=off in boot console). Apparently other Linux distributions do that automatically depending on detected PCI-E version (noticed that in boot logs). CRUX installation is similar but even more primitive than NetBSD or OpenBSD. You need to partition disks manually using fdisk, manually edit fstab, hosts and other required configuration. Eventually you run setup script on mounted root partition (to /mnt) which copies system files to a required location. Then you chroot to your system root partition (setup_chroot), compile your own kernel and install bootloader (lilo is default but I used grub2). This Linux distribution doesn't have initial RAM disk (initrd) by default so all drivers must be compiled into the kernel. This condition eventually became my brick wall. I just couldn't find the combination which could boot the system. Even if partitions and hard drives were visible in boot log during boot process (both attached to native VX800 controller or daughterboard one) it was ending up with "VFS: Unable to mount root fs" error. After a week of recompiling different kernel configurations I decided to give up on CRUX.

antiX Linux

Based on the problems I encountered previously, I looked around for faster solution and stumbled upon Antix Linux which was systemd free Debian based system and supported console based installation. Installation went pretty smoothly but I could not resolve an issue for drives attached to an ADPE4S-PB daughterboard. It is based on Marvell 88SE6145-TFE1 controller and similar to my installation back in 2009 it still required disabling marvell pata driver in most Linux distributions manually. I just couldn't find ways on how to do this in antiX but I didn't want to spend more time in searching for answers from forums (I couldn't find the information I needed from old discussions thus getting it would mean starting a new thread). For this reason I just decided to switch again.

Manjaro Linux

There were three reasons why I wanted to try Manjaro Linux:
  1. I use Manjaro as my main distribution on my desktop system from the end of 2013 to present, and I am familiar with it pretty well by now. 
  2. It is based on Arch which had a good description on how to solve marvell SATA issue from their forums
  3. Manjaro has a console-based Manjaro-Architect edition which gave me hope to run it successfully on the system.
Unfortunately though, the boot process ended up with a message in my monitor showing "Sync. Out of range", and I couldn't find a quick solution to that. Therefore, Manjaro failed as well, which made me decide to try Arch Linux.

Arch Linux

Arch Linux had a console-based installation and required internet access (at least the image I used). It had a couple of helpful utilities which generated files, like fstab, compared to manual editing which can be found on CRUX, although partition management is still manual using fdisk or cfdisk. Installation went pretty smoothly according to Arch Linux documentation. I won't go into details because this article is not about Arch Linux installation process, but I will describe specific changes that I had made to configuration for my particular setup. Initially, I replaced filesystem references from /dev/sd* to partition UUIDs in fstab file. In addition, I also disabled system beep by adding  /etc/modprobe.d/nobeep.conf file with "blacklist pcspkr" content. Finally I performed changes to enable marvell SATA (for ADPE4S-PB daughterboard to work):
  • Added /etc/moprobe.d/blacklist_pata_marvell.conf
    • blacklist pata_marvell
  • Added /etc/modprobe.d/ahci-enable-marvell.conf
    • options ahci marvell_enable=1
  • Modified /etc/mkinitcpio.conf file
    • MODULES=(ahci)
  • Regenarated initramfs
    • mkinitcpio -g /boot/initramfs-linux.img
The boot disk can now be connected to the daughterboard after the changes in the configuration and system reboot were done.

So far this installation runs OK. Arch Linux is a distribution which is relatively less troublesome to install and fully utilizes my hardware.

Conclusions

  • IDE mode is still required for native controller in BIOS if you want to use ADPE4S-PB daughterboard (including the latest A06.1 BIOS version). LAN boot ROM can be enabled though.
  • Additional configuration is still necessary for Linux to see drives attached to ADPE4S-PB daughterboard. More specifically marvell pata driver needs to be disabled and ahci marvell driver needs to be enabled. It means you can't install OS to drive attached to this daughterboard but it can be reattached to the daughterboard after necessary changes are performed.
  • Linux is booting on A06.1 BIOS version currently but only without USB devices attached, therefore I personally recommend A05 version.
  • It takes hell of a time to find the right distribution and configuration mainly because of SATA daughterboard. If you are not using it, most distributions should work out-of-the-box in case they are using console-based installation. I couldn't boot into CRUX unfortunately but setup image works with pcie_aspi disabled, so probably I just couldn't find the right kernel/system configuration.
  • Arch Linux proved to be the most suitable lightweight distribution to utilize ADPE4S-PB daughterboard.
  • Graphical installation or xserver failed to work on any distro mentioned above (not sure about the reasons, some definitely didn't have openchrome driver but I believe some had. Your experience may be different. Since this particular issue is irrelevant to my setup at the moment, I didn't investigate further. VX800 integrated graphics is supported by openchrome driver, so theoretically it should work with light window managers).