Detecting individual trips and "real" arrival time by vehicleId?

Hi there,

I’m pondering what’s the best practice in identifying real trips by vehicle and the relevant timestamp at each stations.
E.g. I want to record that vehicle 205 on the Victoria Line started a new trip at Walthamstow Central at 8:01, stopped at Finsbury Park at 8:10, ended this trip at Brixton at 8:50.

Now, assuming I haven’t missed an endpoint with actual arrival time, all we have is expected arrival times in

Then, one could work backwards:

  • regarding the timestamp of arrival at any station, it’s a matter of updating regularly and saving the last timestamp recorded
  • regarding the identification of a trip, the best way I’ve found thus far is to check if the train inverts the direction inbound/outbound and/or if it reaches the terminus.

Does this work?
Is this the only way, or is there anything better that could be done?


1 Like

Welcome @puntofisso

Our system uses a cron job to capture the necessary TfL data every 5 minutes.



By storing the information every 5 minutes you can capture enough data to be able to show boards (because the arrival times are actual times, not offsets).

If you do this you should be able to work out the differences as you should be able to build up a list of which train called where. 5 minutes would allow for the data to appear often enough to capture all the stop data.

Of course, the TfL system is an “arrival times” system (not departures) and only shows trains once they have left their destination.


Good luck

Hi @briantist,
thanks for your reply :slight_smile:

Just for context, I do have a similar cronjob, except it runs every 30 seconds (the maximum frequency allowed).

From what I see, but I could be very wrong, I’m either not sure I’ve explained myself properly, or things are not working the way I seem to understand from your description – do you mean that a prediction for a past transit should re-appear at some point?

That doesn’t seem to be true. For example, I’m following a train that transit through Finsbury Park. At one point, I see it entering Finsbury Park in the API, which says

"vehicleId": "212",
"naptanId": "940GZZLUFPK",
"stationName": "Finsbury Park Underground Station",
"timeToStation": 9,
"currentLocation": "At Platform",

When I query the API again 30 seconds later, the entry for Finsbury Park station has disappeared, and I only get prediction for the next stations.

So how are you suggesting I could get the actuals from this?

By the way, is there any documentation anywhere to help understand how and when the predictions are generated?

Thanks :slight_smile:


OK, this is the TfL “arrivals system” in operation. It’s 100% unable to show you departures (unlike NRE Darwin which shows both so you can work out things like true dwell times).

The moment a train is in the station, it disappears from the data flow: once arrived, it no longer has any need to provide you with any more information.

However, you can just assume that the last predicted arrival time was the time it arrived. This is going to be the case, even if you poll less frequently as per my 5 minutes suggestion because it requires a substantial systematic failure for the Underground trains to not arrive and when they do their arrival times will be updated in the feed.

The Liz Line and Overground are on Darwin so you can see the full to-the-second data there if you want those lines.


(urgh, 96 second dwell time!)

But for the tubes (and DLR, but without indexes) it’s arrivals all the way: Working Timetables (WTT)

Thanks @briantist, that makes sense :slight_smile:

I’ll keep working to work out the arrival time as you suggest, it seems good enough for my purposes.

I’m still unsure what’s the best way to identify individual trips – I’ll attempt at using any “inversion” of direction, but I need to look into the data to understand if there are nontrivial edge cases (trains going off service, test trains, etc)

1 Like


Bakerloo. Except during engineering works all southbound trains got all the way to E&C, but northbound “often terminate at Queen’s Park or Stonebridge Park”.

Central Line. Some can reverse in platform at White City, but otherwise serve West Ruislip and Ealing Broadway in the west…

In the east train go to either to Hainault or Epping but can turn at other places. Woodford to Hainault is run as a shuttle.

Circle/H&C are two version of the same physical line, but the services all run end-to-end.

District runs as

  • Wimbleware- Wimbledon to Edgware Road
  • Edgware Road to Olympia - but not very often
  • Ealing Broadway to Upminster but can turn at Tower Hill
  • Richmond to Upminster but can turn at Barking
  • Wimbledon to Upminster but can turn at Tower Hill and Barking

Jubliee - 99% of trains run Stanmore to Statford, 1% turn at Wembley Park

Met - like this (note that Watford trains going to Baker Street never all the way to Aldgate).

In peak, the Met trains use the “fast lines” to skip stops in the “direction of flow” so can physically skip stations.

Northern Line

  • Battersea Power station (or Kennington) to Edgware via Camden Town
  • Battersea Power station (or Kennington) to High Barnet via Camden Town
  • Battersea Power station (or Kennington) to Mill Hill East via Camden Town
  • Morden to Edgware via Bank and Camden Town
  • Morden to High Barnet via Bank and Camden Town
  • Morden to Mill Hill East via Bank and Camden Town
  • Morden to Edgware via Charing Cross and Camden Town (not often)
  • Morden to High Barnet via Charing Cross and Camden Town (not often)
  • Morden to Mill Hill East via Charing Cross and Camden Town (not often)

Piccadilly has two branches in the west and most service go to the airport. The unidirectional loop is an extra thing for your code to do!

Victoria line - every 90 seconds trains run end to end. The Victoria line uses stepping back to allow all trains to run end to end using different drivers for each run.

London Overground - like this…

Liz runs like this

with these timings

Good luck!

That’s amazing, thanks @briantist!

1 Like