Noisy progress data in Unified API

FrankLieder · February 22, 2024, 9:27pm

Hi. Firstly it’s great to have such a complete public API for TFL. Bravo!

I am trying to build a realtime Tube train visualisation from the API and
whilst I’ve made good progress there seems to be some noise behaviour which I
am struggling to understand.

Currently my app makes a Line/Arrivals request every 10 seconds to determine
the progress of each train. I take the shortest timeToStation for each vehicle
to work out which section of track the train is in. E.g. if the next station is
Oakwood and the direction is outbound I know the train is somewhere between
Southgate and Oakwood.

E.g. here are some updates for train 357:

Time: 2024-01-21 18:44:18.635507 +00:00, Train: piccadilly-357, next/dest: 940GZZLUCGN / 940GZZLUASG, ttn: 21, current: At Leicester Square Platform 2
Time: 2024-01-21 18:44:28.726488 +00:00, Train: piccadilly-357, next/dest: 940GZZLUCGN / 940GZZLUASG, ttn: 21, current: At Leicester Square Platform 2
Time: 2024-01-21 18:44:38.826605 +00:00, Train: piccadilly-357, next/dest: 940GZZLUCGN / 940GZZLUASG, ttn: 31, current: At Leicester Square Platform 2
Time: 2024-01-21 18:44:49.000192 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 96, current: At Covent Garden Platform 2
Time: 2024-01-21 18:44:59.132799 +00:00, Train: piccadilly-357, next/dest: 940GZZLUCGN / 940GZZLUASG, ttn: 21, current: At Leicester Square Platform 2
Time: 2024-01-21 18:45:09.232881 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 96, current: At Covent Garden Platform 2
Time: 2024-01-21 18:45:19.405646 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 66, current: At Covent Garden Platform 2
Time: 2024-01-21 18:45:29.536193 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 56, current: At Covent Garden Platform 2
Time: 2024-01-21 18:45:39.628180 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 96, current: At Covent Garden Platform 2
Time: 2024-01-21 18:45:49.788447 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 26, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:46:00.016988 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 66, current: At Covent Garden Platform 2
Time: 2024-01-21 18:46:10.091751 +00:00, Train: piccadilly-357, next/dest: 940GZZLUHBN / 940GZZLUASG, ttn: 56, current: At Covent Garden Platform 2
Time: 2024-01-21 18:46:20.263479 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 116, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:46:30.381379 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 106, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:46:40.515043 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 106, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:46:50.616769 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 116, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:47:00.745937 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 106, current: Between Covent Garden and Holborn
Time: 2024-01-21 18:47:11.047562 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 68, current: At Holborn Platform 4
Time: 2024-01-21 18:47:21.413901 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 58, current: At Holborn Platform 4
Time: 2024-01-21 18:47:31.586210 +00:00, Train: piccadilly-357, next/dest: 940GZZLURSQ / 940GZZLUASG, ttn: 48, current: At Holborn Platform 4

Predications go backwards.

You can see at 18:44:49 I am told the train’s next station is HBN, but then
10 seconds later it switches back to CGN.

Is this expected?

At present I just ignore these udpates and assume there is some eventual
consistency behaviour across the API servers I am finding. Is this right? If
so, is there a way to request a consistent set of updates?
Can I use current_location?

My current approach does not really allow me to know how long a train is
dwelling at a station. As soon as the next station changes, I start
animating the train leaving the station. I had hoped to parse
current_location to help with this, but I see it often looks stale or out of
sync.

E.g at 18:46:20 the train is showing its next station is RSQ but the current
location is indicating its between CGN and HBN.

Can I assume that the naptanId (next station, when looking at the shortest
timeToStation) is authoritative here?

Finally, is there any material on how this API works? What are the data sources
and how does the API deliver these?

Thanks,

Frank.

jamesevans · February 23, 2024, 1:47pm

Hi @FrankLieder - welcome to the forum.

I’ve put a simplified version of our data flow when it comes to tube predictions below:

Our ETL populates the database powering the API every 30s.

The Arrivals endpoint has a time-to-live of 60s and each of the servers in the caching layers returns results for 60s without calling the API application for new data. They do this independently so will replenish their caches at different points in the cycle.

What I think is happening is that sometimes you’re hitting a cache server that has a slightly older version of the data you called 10s earlier as it’s at a different point in its cycle.

It may be possible to reduce the endpoint cache TTL to align more to the refresh rate of the data in the database. I’ll discuss this with colleagues though as it would potentially have a performance/capacity impact on the API layer.

I think this may explain both your points, especially the first one.

Many thanks,
James

FrankLieder · February 23, 2024, 4:05pm

Fantastic, thanks for the insight into what’s happening. It was what I was thinking.

Switching the 30s caching would certainly help, but will not eradicate the issue as there still could be up to 2 ETL cycles being served from the caching servers at the same time.

Might it also be possible to pin (best efforts affinity) requests to a specific cache server? Maybe supplying a header in the request (value contained in a previous response) to enable the LB to route to a consistent cache instance? (Is this already possible?)

Also, could you somehow expose the 30s ETL/60s cache refresh timing so I can auto-tune my calls to match this… no point me calling every 10s if the content will change only every 30s or 60s. Maybe provide a hints API to provide such metadata?

Finally, are there plans for an async push API to avoid the need for polling? I could just get an update on the train whenever the underlying predictions change.

Many thanks,

Frank.

jamesevans · February 23, 2024, 4:29pm

Hi @FrankLieder - yes, it would be a reduction in the instances of this kind of behaviour rather than a full fix. Unfortunately due to the CDN and API management layers, sticking your session to a single cache server isn’t possible, I’m also looking into having this cache primarily at the CDN. We don’t currently leverage this, but in theory would give you a consistent response.

It may also be worth you subscribing to our push API using websockets so you get updates when new ones are available.

This is an example on how to subscribe to that (currently for a specific bus stop), but you can play around with adding your own parameters:

Many thanks,
James

FrankLieder · February 23, 2024, 4:47pm

Thanks, I’ll take a look at the push API.