A couple of inconsistencies in tube station entrance data

I’ve just started working on a new app using the Unified API and have been fetching the data we need about stations and their entrances from the StopPoints by Mode endpoint. (actually tried to get DLR and Tube from this endpoint at the same time but get a 504 GATEWAY_TIMEOUT - probably just too much data!).

After using TransportInterchange StopPoints to also find entrances for Overground / National Rail stations that are also Tube / DLR stations, I got entrance data (no idea if complete, but at least some) for all but 2 Tube stations.

  1. Edgware (northern line) does not appear to have any entrances in the data. I think it’s a small station with 1 entrance, but the station co-ordinate is not the entrance position, so I thought it was worth reporting.

  2. Wimbledon (naptanId: “HUBWIN”) TransportInterchange does actually have entrance data but my initial script didn’t find it because it has a weird nested National Rail station structure. The NaptanMetroStation doesn’t have entrance data. The TransportInterchange has a child NaptanRailStation with naptanId “910GWDON” which doesn’t have any entrance data, but it in turn has a child NaptanRailStation with naptanId “910GWIMBLDN” which does have entrance data.

Just reporting in case anyone likes to fix this kind of inconsistency.

Mark

@markw

Welcome!

I suspect your 504 is beecause you were being rate-limited. It’s a good idea to both cache your API requests and limit the rate of calling. You might find sleep(5) will do the trick.

Not sure about Edgware, but Wimbledon is a bit of nightmare. There are two tram stops which aren’t “inside” the station whereas the District Line platforms are…

So there are yellow “tram payment” card readers by platform 10 and 10b.

Looking at https://www.nationalrail.co.uk/stations-and-destinations/stations-made-easy/wimbledon-station-plan which tends to be accurate about such things makes no mention of platfoms 10/10b.

The “HUBWIN” will also contain the Bus Stops that have “Wimbledon Station” in their name.

Thanks @briantist.

I suspect your 504 is beecause you were being rate-limited. It’s a good idea to both cache your API requests and limit the rate of calling. You might find sleep(5) will do the trick.

Not a rate limit - a single API call:
https://api.tfl.gov.uk/StopPoint/Mode/tube%2Cdlr?app_id={{app_id}}&app_key={{app_key}}

Note that this API call just for ‘tube’ returns about 31.5MB of JSON. I’m very much filtering this stuff down (to less than 70KB that I need) and then caching it. I’ll probably check for updates once a day or similar when we get to production.

My report for Wimbledon was more related to inconsistent data structure making the data harder to consume. Without double-checking the data I’m not using, I seem to remember the tram and bus station/stops were available as children of the interchange. It’s just a national rail station having another national rail station as a child that seemed particularly unusual.

I’m sure I’ll get to data issues reflecting the complexity of the real world stations in unexpected ways later… I already noticed some unique properties of Canary Wharf in that regard. :slight_smile:

OK, I might suggest that you break that down into https://api.tfl.gov.uk/StopPoint/Mode/tube and https://api.tfl.gov.uk/StopPoint/Mode/dlr as they both come back within 30ms for me.

My report for Wimbledon was more related to inconsistent data structure making the data harder to consume.

Welcome to the world of station data! I find that merging all the sub-levels into a single flat dataset works for me generally, but the structure does signify useful relationships … sometimes.

It’s just a national rail station having another national rail station as a child that seemed particularly unusual.

That’s because what you have is two stations right next to each other. One is the District Line, built as the Metropolitan District Railway. The other is the South Western main line. They are conceptually two stations. However, unlike Waterloo where the underground and overground both stations have platform numbers starting with “1”, here both “stations” share a numbering scheme.

Canary Wharf will be complex when the Liz Line opens, but using the Julinee Line does sometimes have two beep at two gatelines.

This is always a great resource is you need to find out what is really going on…

1 Like

OK, I might suggest that you break that down into https://api.tfl.gov.uk/StopPoint/Mode/tube and https://api.tfl.gov.uk/StopPoint/Mode/dlr as they both come back within 30ms for me.

Thanks, yes - that’s exactly what I did do already and it works. The API just advertised allowing you to ask for multiple comma-separated modes, but it can’t actually do that (at least not in the case of tube + dlr - I didn’t try any other combinations).

I am also going for the single flat dataset approach. At least to start with. Fortunately my app doesn’t need to know that much about the stations currently.

Hmmm, so there are still two nested NaptanRailStation type entities because parts of the District Line used to be the Metropolitan District Railway that was actually a National Rail line and not a London Underground line? Even if this is the case, the nesting seems unnecessarily confusing, and the NaptanMetroStation for the District Line is in the correct place in the dataset, so the old entry would seem redundant and worthy of removing.

I have no idea how this data is actually maintained though… hence posting into the void on a forum and hoping someone that cares and has the power to fix it might see it and feel inspired to do something about it.

I imagine that @jamesevans might be around and able to look at it when he has time. I’m not sure if the data is incorrect, but it’s quite, quite possible that is. I’ve “helped out” with quite a bit of it and there’s been ongoing bus timetable issues too.

But things do get fixed, eventually!

1 Like

@markw

I’m guessing this might explain the hierarchy of station codes?

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/561318/background-information-naptan.pdf

I’d found that and had a browse, but it doesn’t specifically mention the TfL stopTypes and how we should expect them to be organised. Basically stopPoints can have children which are other stopPoints… the schema doesn’t seem to impose more rigid constraints than that.

@markw

I think the idea of NAPTAN is that they work UK-wide, so the TfL data should align with all other data. I don’t think there are any specific hierarchies imposed, only what happens “in the real world”.