Friday, 5 September 2025

BirdNet Systems Testing #1

I thought this would be easy

 Just run BirdNet-Pi alondside BirdNet-Go

Then choose which system to use in the future.

Summary

I’m testing BirdNet-Go (tphakala) alongside BirdNet-Pi (Nachtzuster) to gain some idea of performance and system limitations.

I’m not sure what to read into these test results, as both the repeatability and comparison between systems are not great.

It looks like further testing and investigation are required, but here are my findings so far.

 

Notes on species

  • I use my own species labels, so you won’t see prefixes such as European, Eurasian or common, and ‘raven or dog’ detections are almost always due to a barking dog!
  • Although swifts are ‘out-of-season’ in September at this location, we have a swift caller still running within a swift box on the north wall of the house.


Test Configuration

This illustration shows the initial system layout.


 

The primary system installed in the loft of our house has been running BirdNet-Pi for a couple of years; during 2023/4 it ran on the McGuire BirdNet-Pi software, but was changed in December 2024 to use the Nachtzuster system for 2025. It is essentially ‘always on’.

The electret microphone module (Panasonic WM-61A) is sunk into the end of a bamboo cane and pushed out through a gap between the tiled roof and the back wall (west facing) at a height of about 5m. The mic is sheltered by the roof overhang and picks up sound primarily from the rear garden, away from typical street noise.

The audio-USB module is my best-performing module; its a no-brand white module using a CM108 chip. The Raspberry Pi 4 is earthed via a mains earth connection to the body of one of the USB connectors (i.e. the grounded metalwork). This reduces electrical noise and practically eliminates audio mains hum (50Hz).

This system provides an RTSP audio stream via wifi. To minimise any onward wireless issues, which may disrupt one or other of the downstream test systems, both test systems use ethernet from the router.

Although the 2 test systems have different amounts of RAM, this is not thought to influence test results, as even 2GB appears more than adequate to run BirdNet on a Pi 4.

Initial tests

BirdNet-Pi was installed on the 2GB Pi and BirdNet-Go on the 4GB Pi.

Each system was run with similar settings where possible, including;

  • location (Lat/Lon) & range: 0.03
  • Sensitivity: 1
  • Overlap: 1

Differences included;

  • Although all systems were using model; BirdNET_GLOBAL_6K_V2.4_Model_FP16 (2023) it was noted that the ‘Go’ system is using a newer bird species list, which contains 143 current bird species, while the Pi systems have only 111 species.
  • Threshold: Primary system was left undisturbed at 0.8 (80%), while the test systems were set to 0.7. This was so that test system results could be checked for detections just below the 80% confidence threshold level, while still using 80% as the ‘required’ target.

The system was run on 3rd September 2025 and the data was extracted from each of the 3 systems to CSV files using DB Browser for SQLite.


The results show some inconsistencies.

 


The 2 BirdNet-Pi Systems recorded more detections than BirdNet-Go during the period of the test. However, this may not be a problem, and is probably just due to the way they are counted on a given system; e.g. are 2 calls detected during a short time frame taken as a count of 1 or 2?

Although the 2 BirdNet-Pi systems had a similar number of total detections (36 & 37) and detected a similar number of species (10 & 11) the list of species does not exactly match; Hooded Crow, Golden Plover & Raven or Dog reported by the primary system, but not seen by the test systems. While curlew, great-creasted grebe & Jackdaw was seen by a test system, but not by the primary.

Likewise the BirdNet-Go system recorded slightly less species (8 rather than 10 or 11) but species detected included common scoter (which was not seen by the others) and did not detect swift, although there was 1 detection in the 70-80% range.

If the primary system is seen to out-perform the test systems, it could be due to the ‘quality’ of the RTSP audio stream (although no audible issues have been noticed).

So I decided to run a series of repeatability tests.

This involved capturing a real recording with a runtime of about 40-45 minutes using my laptop from the primary system stream. I then streamed this recording to the BirdNet-Go system.

The test was repeated 5 times, then the overlap setting was change and the tests repeated.

For Overlap=1

There is some variability in the number of detections: 11 to 13

Species detection:-

  • wood-pigeon: 1/5 detections
  • blue tit: 2/5
  • collard-dove: 3/5

Note: Although the confidence results are presented in rows, don’t assume that this shows the variation of a particular individual call, as my attempts to align them by time was not successful, i.e. there may have been (say) 10 swift calls on the recording and the 4 detection figures given could have been for different individual calls each time!

Conclusion: repeatability is not great.

For Overlap=2

Clearly both the number of detections and number of species have dropped with this increase in overlap setting. This is very odd, as higher overlap settings are expected to yield better results.

 

I need to do more testing, when I can find the time!

I probably need to make more use of recordings to check performance & repeatability, and maybe assess them first with BirdNet Analyser.


2 comments:

  1. Hi, very interesting thanks for sharing ! I did a bit the same using both birdnet-pi and birdnet-go running in parallel on docker container fed by a single same rtsp feed. It allows running them for a long time (days or weeks) and comparing the results. In the end both led to quite different results : birdnet-go has a feature that decreases confidence threshold for repeated species which amplifies the detection of common birds, while having a false positive feature that discards single vocalization birds ; this was especially visible during the night. I think both are very interesting anyway and we are very lucky to have those tools

    ReplyDelete
  2. Many thanks for your comments, & sorry for the delay in picking this up.
    I must admit that I don't know what to do next. The variations in species detection, when using the same test file a number of times, does make me wonder if either system can really be relied upon to provide useful information. I've found BNP data interesting for migrant species, but I have to remember that the system is already biased in most cases to reject species that are out-of-season.
    I guess I could continue testing to find 'good' settings for filtering & overlap on BNG.
    But yes I agree, we are lucky to have these interesting tools ...and just think what they may be capable of in another 2 or 3 years!

    ReplyDelete