Hi folks, I’ve written a Prometheus exporter for the BikePoint API to track availability over time. The idea was to potentially raise notifications when a given station is below a threshold of available bikes or docks.
Unfortunately, sometimes station data does this:
My hypothesis is this happens when a subset of backend processes stop receiving new data, causing them to serve the same values over and over again. From this point on, a given API call randomly receives either current or stale numbers, causing the oscillation when graphed over time. This would explain why the “wrong” value is a constant equal to the correct value at the time the pathological behaviour starts.
I’m seeing the same results when querying the API from two different data centres in London. The behaviour also correlates with slightly lower p99 response times, I suspect brought down by the bad values. Perhaps the backend is reading from a fast but stale cache rather than retrieving from the source:
Any help would be much appreciated! I’m happy to run experiments.