Countdown / Improving bus-stops.csv

In recreating the pre-may 2016 Countdown site at http://eztfl.pectw.net/bus I’ve noticed that your https://data.tfl.gov.uk/tfl/syndication/feeds/bus-stops.csv file is chronically incomplete.

a. My site converts the user’s bus stop code to the naptan code using bus-stops.csv per these docs
b. The naptan code is used to generate the https://api.tfl.gov.uk/StopPoint/<naptan>/arrivals API call to retrieve bus countdown data per documentation here
c. 100 of the bus stop codes in that CSV file do not have naptan entries

Example: Bus stop code 91623. CSV indicates on row 17250 (as of this date) that there is no naptan code assigned. However, when entering bus stop code 91623 at your main site we get page https://tfl.gov.uk/bus/stop/490001206P/new-malden-station/ which has the the naptan right there in the url. This has been so since mid-March 2017.

Solution sought:
(Assuming my process above is otherwise sound)
Option 1. Can we have the bus-stops.csv file bought up to date, please? It would also help if we developers could determine when that file was last checked and considered current. HTTP Last-Modified: doesn’t cut that mustard. Adding a “Last checked current” row or column would work, as would using HTTP 302 to redirect from bus-stops.csv.latest to bus-stop.<timestamp-last-checked-current>.csv. Adding the date to the docs is less friendly to automation.
Option 2. Your main site’s behaviour in the example suggests your own developers have access to a superior file for converting bus stop codes to naptan. If so, where can we find it?
Option 3. Add support for https://api.tfl.gov.uk/StopPoint/<bus stop code>/arrivals. This would also be closer to the Semantic Web ideal, as bus stop codes are available at bus stops, whereas naptans are not.

Thanks!

Yes that bus-stops.csv is very out of date - I’m not sure if we should still be distributing that at all. I’ll request an update or removal of this data.

The example you give using the main site uses the Search API, /StopPoint/Search?query={id}, which detects that your query is numerical and returns a StopPoint with matching SMS code. You can then plug the resulting id of the MatchedStop into the Arrivals API:

$ curl -s "https://api.tfl.gov.uk/StopPoint/Search?query=91623" | jq '.matches[0].id'
"490001206P"

$ curl -s "https://api.tfl.gov.uk/StopPoint/490G01206H/Arrivals" | jq '.'
[]

Hmm - it’s empty. Looks like that stop doesn’t serve any routes. Let’s try one that does:

$ curl -s "https://api.tfl.gov.uk/StopPoint/Search?query=73241" | jq '.matches[0].id'
"490010260SD"

$ curl -s "https://api.tfl.gov.uk/StopPoint/490010260SD/Arrivals" | jq '.[] | .destinationName, .expectedArrival'
"Fulham Broadway"
"2017-04-13T18:54:42Z"
"Fulham Broadway"
"2017-04-13T19:02:08Z"
"Fulham Broadway"
"2017-04-13T18:42:40Z"
"Shepherd's Bush"
"2017-04-13T18:53:43Z"
"Shepherd's Bush"
...etc

That should allow you to do Option 3 maybe?

Internal tracking
Update bus-stops.csv: TECH-220
Bug in /StopPoint/Sms/{id} API: SVC-3591

I would certainly urge keeping bus-stops.csv:

It saves considerable bandwidth for both TFL and dev. The Search?query step adds the equivalent of the bus-stops.csv in traffic, to both of us, every 168 requests. (1453580b vs 8620b as detected via tcpdump -v)

It saves considerable time The Search?query lookup sees the entire process of obtaining the countdown take approx 45% longer. (Back of the envelope from watching Chrome’s developer tools/ network traffic)

It would also keep TFL’s CPU load down as Search?query wouldn’t need so much use

The above is based on temporarily implementing the Search?query step instead of using bus-stops.csv.

Cheers!

Hi,

Rest assured, for now (at least) we will be keeping bus-stops.csv.

I am just investigating this now with the data owners about those missing naptan entries and how we can prevent such issues arising again with subsequent uploads of this bus-stops file.

Will provide an update once I hear back from them.

Thanks,

Dave

1 Like

This is being investigated TECH-229