Getting unique vehicle id for arrivals

richardhawthorn · February 19, 2024, 2:17pm

Hi there

I use the ‘/line/[line]/arrivals’ endpoint to get all arrivals for each line on the tube. This generally works really well, however I understand that the vehicle id field is actually bringing back service/duty number, and doesn’t actually relate to the vehicle.

This wouldn’t be a problem if it was unique, but not only is it not unique, but for some lines (Metropolitan for example) many of the arrivals have the vehicle id set to “000”.

Having a unique value for each vehicle would be really useful, so I can see where the vehicles are, and would be able to see their updated positions each time I poll the api. Currently polling the api gives me a new set of results, and in many cases I have no idea how the vehicles map to the previous set of results.

I see in previous forum posts people asking for this, and the potential for it to be added to the api, but it looks like this hasn’t happened.

Is it possible to add this?

Thanks

Richard

briantist · February 19, 2024, 3:41pm

This topic was last re-raised here - How do I uniquely identify a vehicle?

richardhawthorn · February 19, 2024, 4:11pm

Digging in further it looks like TrackerNet has detailed predictions which include TrainId, which looks at a glance to be unique.

http://cloud.tfl.gov.uk/TrackerNet/PredictionDetailed/M/RUI

However one call is needed per station, so there would be a large number of calls to get data for each station, every minute.

There is the PredictionSummary endpoint which can be called per-line, but this doesn’t include the TrainId.

Is there a rate limiter on TrackerNet?

grahamwell · February 28, 2024, 1:52pm

The TrainID in TrackerNet is not as helpful for this as initially appears. There are typically two different (and apparently unrelated) IDs for each train (depends on the line) - you’ll get arrival predictions from first one and then the other (or not) and sometimes they disagree. The IDs can also change, without warning, in the middle of a journey - I think this may have something to do with a driver change, but I don’t know.

In the TrackerNet data there’s also ‘LCID’ and ‘Leading Car’ but they are often blank. There’s no good solution to this I fear.

The better news is that you don’t need to poll every station to get a complete picture of the line. Trackernet will give detailed predictions for roughly 30 minutes from the current train location, so polling stations about 20 minutes travel time apart is usually enough to get everything.

(I am very curious as to how the TrainIDs in TrackerNet are generated, why there are two of them and why they change … any ideas very welcome)

netstruggler · February 29, 2024, 11:27am

(I am very curious as to how the TrainIDs in TrackerNet are generated, why there are two of them and why they change … any ideas very welcome)

It’s complicated - and I certainly don’t understand all of it, but I know some.

‘Leading Car’. This is unique and a constant, it’s stencilled onto the train and is equivalent to the Licence Plate on a bus. The signalling systems which control the movement of the trains generally don’t know it though, [particularly the older systems]. More modern signalling systems will know it and may pass it on to TrackerNet.

LCID ‘Local Computer ID’. This is an arbitrary identifier generated by [older] computerised signalling systems for the purposes of following trains. It has no meaning but will be unique at any given time and will generally remain constant throughout the time that the train remains under the control of that signalling system. If the train moves to a section under a different control system and then re-enters I assume it would be given a new LCID? You shouldn’t ever see the same LICD on two different trains during a single day. Again, this information may or may not be passed to TrackerNet.

TRN ‘Train Running Number’ This is what TrackerNet most commonly provides and is the train’s number in the timetable which defines the route the train will follow. These are usually three digit Octal numbers and are equivalent to the number on the front of a bus. Unlike bus timetables there is only one vehicle with each number in the timetable but, like buses, trains may swap numbers during the day as a result of disruption or to ‘regulate the service.’ This means that, for short periods there can be more than one train running round with the same number, or a particular number may suddenly disappear. This is particular common during periods of extreme disruption where the timetable is abandoned and all trains may be given one of the same few numbers - or no number at all ‘000’.

In summary: the Train Running Number is what customers are interested in, but if you want to follow the path of each train around the network the Leading Car is best, followed by LCID (assuming these are available). Trying to follow trains using TRN will work OK on good days but when the service is severely disrupted it will give strange results. Even in the timetable TRN’s are not guaranteed to be unique across different lines so if you’re monitoring more than one line you need to take that into account.

I’m guessing now:

I think TrainID might be a unique ID which TrackerNet generates internally for the purpose of its own train following. Hopefully it provides the bridge between the different data it receives from the various control systems through which trains pass, even on the same line?
When a train moves from a computer controlled area which provides TrackerNet with an LCID value to one that does not (such as Rayners Lane to Uxbridge which I think is still manually signalled?) then TrackerNet ‘remembers’ the LCID which it originally carried?

I hope some of the above helps and would be interested in any corrections.

grahamwell · February 29, 2024, 1:24pm

Thank you, that’s really helpful and has stimulated me to wondering if I can do better than I’m doing right now.

The TrainID is a curiosity. I’m rather focused on the Metropolitan line, where the problem of identifying trains is most acute (at least beyond Wembley Park) and on these bits of the network it’s quite common to have no information at all, no LCID, no Leading Car and a SetNo (Train Running Number) of ‘000’.

As I’ve already mentioned there are typically two TrackerNet TrainIDs for each physical train - to pick an example right now, this is the Watford train at Northwood Hills. Set No is 000. TrainID1 is 1096711 and TrainID2 is 1096461. In this case there is an LCID (24982) and Leading Car (21074) so I’m confident that these two sets of TrackerNet results with different TrainIDs refer to the same physical train.

This isn’t the case on all lines though, the Jubilee line for example only has one TrainID per train - perhaps you’d expect that since it’s more modern. The Central line is curious, sometimes there’s one, sometimes two.

If you can link the two IDs together through some other piece of data, that seeems to be the best approach to getting a unique ID which lasts for the duration of the journey. However I’m often reduced to guessing, based on the fact that two different TrackerNet records refer to a similar ‘looking’ train and appear to be on the same bit of track. That’s not always the case, sometimes two IDs claim to be on different track sections even though they’re clearly the same train and very occasionally they give contradictory information, for example different destinations. That’s confusing.

They do occasionally change in mid-journey as well, specifically at Harrow on the Hill (where all sorts of strange and unpleasant things happen). My working theory is that these numbers relate to the train staff - it’s a guess - I really have no idea - and that these numbers may change if there’s a driver change. It pleases me to think that there’s one at the front of the train and one at the back, hence the sometimes differing track codes.

This discussion has prompted me to look again at the function that I’ve made to generate a unique identifier. After a lot of trial and error, this approach seems to work best, but I’m sure it can be improved:

If there’s a SetID that isn’t ‘000’, then use that, but put a line identifying digit at the front and add the TrackerNet ‘Trip Number’ to the end - it helps sort out the occasional duplicate SetIDs on a line. If not then …

If there’s an LCID then add a digit to the beginning to identify the line and use that (I assume I’m doing that because I found duplicate LCIDs on different lines). If the LCID is less than 99 (but greater than 0) stick 99 at the front (to avoid possible conflicts with the SetIDs from step 1) If no LCID then …

If there’s a Leading Car just use that. If not …

… use the TrackerNet TrainID, but before you return that value, check to see if there’s another TrainID on the same track. If so, link them together and return one or the other but not both (I stick them in a dictionary and pick one).

… endif
…endif
…endif
endif

Which is a bit of a performance and I’m sad to say, doesn’t always work. I do get occasional duplicates, records that - when you plot them on a map - obviously relate to the same train but there is no piece of data that can automatically link them. To get round this issue I would need to add a proximity check, to see if these two ‘trains’ are so close together that they must, actually, be the same vehicle (or there’s some horrible accident in progress). Using the track location field might work for that … sometimes. Alternatively a proper graph of the track codes is what’s needed (so say we all).

None of this will work on the DLR, where - as far as I can see, there’s no useful information at all. Sorting that out is an intruiging technical problem that I might wickedly assign to one of my A-Level students as a project, but frankly - DLR - I just don’t care

netstruggler · February 29, 2024, 4:14pm

I understand what you mean now. The query returns a prediction for a list of trains each with a unique TrainID. When you repeat the query you get the same list of trains but each one now has a different TrainID. Then you run the query again and you get the list with the first set of IDs again.

I wonder if this is the same problem as discussed here?

Where results are being returned from two different data caches?

grahamwell · February 29, 2024, 7:24pm

Yes, that’s possible, but if it was something to do with caching you’d expect to see the same thing on all lines and you don’t.

2 ids: Bakerloo, Metropolitan, District
1.5 (seems to vary) Hammersmith and Circle, Central, Piccadilly
1 id: Victoria, Jubilee, Northern

The “sometimes one, sometimes two” lines are peculiar. It could be an error with my coding, but if it was I’d expect to see duplicate trains … that’s what happens on the Metropolitan if I get the matching wrong, however I don’t see that. (actually, come to think of it, if I get a decent SetID, LCID or Leading Car then I wouldn’t end up with a duplicate - maybe it is just me - I will investigate)

So you might well be right, it might be that there are two different caches or servers responding to the TrackerNet queries and generating their own IDs. Hmmm.

(My pet theory was that it’s the number of TFL staff on the train and that they carry a magic box with them that generates the data and this ID, but that does sound a bit far-fetched)

netstruggler · March 1, 2024, 9:13am

Maybe inconsistent data into TrackerNet? The three good lines are the most recently modernised, though the District, Met and H&C are part way through a modernisation and are expected to improve.

To clarify - this is what I saw running the query from richardhawthorn’s post above:

First time

then I run it again and the TrainID for train 440 trip 3 changes from 1146521 to 1146741

then I run it again and the TrainID changes back to 1146521

and so on; it continues flipping between those 2 values until the train arrives.

My gut feeling is that, bad data or not, TrackerNet could do better.