As I’m sure everyone is aware, GTFS is the de facto international standard for storing transit network data.
I would like to enquire if there are any plans to release this data within the API or another public location. And if there are not plans, that this should be a very high priority release for improving the ability for people to work with and analyse TfL network data.
It is a relatively simple format to produce, which TfL should already have maintained somewhere, given that Google Maps has TfL data and Google requests transit data in GTF format: Reference | Static Transit | Google Developers
The vast majority of software and open source libraries that work with transit data are designed around GTFS due to its prevalence, not being able to apply any of this work to TfL data without creating or finding some sort of conversion software is a severe limitation and makes a lot of projects harder than they need to be.
TransXChange is the UK standard format for transport timetable information, and TfLs timetables are available in that format from our Open Data pages. GTFS is originally a Google specification, and while it has become a defacto standard as you say, it didn’t meet our needs when developing a RESTful API. We were also aware of a few TransXChange converters, so didn’t feel that re-inventing the wheel would add any value.
As far as I’m aware, Google do their own conversion work from TransXChange to whatever they use internally, which may well be GTFS, but they might skip that intermediate format and just import TransXChange directly. I’ll see what I can find out from the data owner.
The short answer to your question is that no, there are not any plans to offer a TfL-curated GTFS data set. This is certainly an area where the Open Data community could help, perhaps even with a solution that would work for the whole of the UK.
It’s also true that there is plenty of information that can be stored in TransXChange that can’t be shown in GTFS - or at least this was the case last time I checked. TransXChange is a horrible format, and many improvements could be made, but the fact that it is easily able to handle every nuance of bus, tube, rail and even the cable car without breaking sweat means it will always have a benefit to those of us looking for accurate data, and is why there are linked standards to it across Europe.
To bring in some experience from a recent project I undertook for the DfT - it’s very hard to agree on ‘the standard’. Whilst one person may want GTFS because it maps to their domain well, somebody else may find the complete lack of any additional data a showstopper. And then if you release the data in TransXChange format, somebody else might want it in RailML or SIRI. But even SIRI has corner-cases, e.g. a Company X variant with some fields populated in a different way to Company Y’s variant, so you still have work to do.
On the heavy rail side of things, we have schedule data published in its native CIF format because, when it was published only in a more lightweight JSON format, everyone who had software which copes with the deeply entrenched CIF standard didn’t have the appetite to move over. And to be honest, why should they?
I don’t think there will ever be a standard format or model that fits every sort of timetable. At some point, you have to put the effort in to deal with incompatible data standards - but that’s where the fun is!
Thanks very much for the reply and for investigating, if Google end up converting to GTFS then that might be a potential ready-made source for it.
If there are converters available, would it be difficult for TfL to perform that conversion centrally and then serve it online? As Poggs mentioned I know it can be a bit ‘me too’ with these things but GTFS is used by so many agencies worldwide and has so much research and software built around it that it would be a very valuable way of opening data up and making things accessible.
But given TfL’s not planning anytime soon I suppose I better start looking into these converters and see how they work!
I’ll try and report back with how the process went.
To report on progress, this GitHub - CommuteStream/tflgtfs: Transport For London GTFS Exporter looks like the most promising convertor, however I have been getting an error when fetching the lines, which I have raised as an issue on the github Error on running fetch-lines · Issue #15 · CommuteStream/tflgtfs · GitHub
Turned out to be an issue with the API which was swiftly corrected! I’ll report on things once I’ve explored the result.
Partial success! With the GTFS data converted I managed to produce a nice clean graph of the transport network (no National Rail sadly), with a journey time on each edge.
No National Rail because the gtfs exporter software GitHub - CommuteStream/tflgtfs: Transport For London GTFS Exporter is throwing errors
Error (line southern): The stop you selected has now been removed from the route, and therefore we cannot show you a timetable. The route page will be updated shortly to reflect these changes.
on locating any of the National Rail lines, which means the graph is not complete and therefore any analysis would be incomplete. There are also a few other errors which mean the result is not 100% correct.
I’ve filed an issue on github Error (line XXXXX): The stop you selected has now been removed from the route, and therefore we cannot show you a timetable. · Issue #16 · CommuteStream/tflgtfs · GitHub although it may be an API issue just like last time.
In conclusion, my main worry is that any change in the API will break this sort of methodology, the converter is not in a language I am familiar with (although I really wish I could help with this so maybe one day I can learn).
I do continue to wish that TfL consider either maintaining a GTFS feed themselves or better yet, to serve static graphs of the transport network for the purposes of scientific analysis as discussed here.
Very interesting thanks @wa-ssx. We don’t load all of the National Rail timetables into our API, so that would explain the error you’re seeing. That message is indeed from our API backend. We are exploring whether we can load the entirety of the UK dataset into the API, which should fix those problems and allow for a full UK GTFS export…
Out of interest, how long does the converter take to run?
Ah great, good to know full National Rail timetables are being explored (although I couldn’t find the Overground or other London rail in the output either). I think in all it took about 20 or so minutes to run, for the majority of the time the programme was downloading jsons from TfL API calls into a cache, after that the actual gtfs output is pretty quick!
I fundamentally disagree. While GTFS is succinct and to the the point TransXChange is a sprawling mess which is so “nuanced” it quickly becomes unusable. Any spec which can specify a period of operation as “School Holidays” is broken. Moreover, GTFS handles multi-modal just fine. Few people outside the UK seem to use TransXChange. Having said all that TfL’s handling of TransXChange is the best I’ve come across.
I think we will have to agree to disagree then I don’t like TransXChange either, but “School Holidays” is one of those things you have to deal with in the UK - it will depend on borough and sometimes even on school. (For example my old secondary school had service buses that departed from the playground at the end of a school day, but started at the other end of the village at all other times)
It would be lovely if the data was provided specifying actual dates as it is with National Rail, I admit, but I am not sure GTFS is the correct format for this either!
GTFS forces you to supply actual dates, either in calendar.txt or calendar_dates.txt, from the spec “If there are specific days when a trip is not available, such as holidays, you should define these in the calendar_dates.txt file.”. This file is also used to define additional services, e.g. those that only run on “School Holidays”. It may require extra work to update these files from time to time. Berlin is a good example. At the beginning of each year they publish the entire year’s schedules. Periodically throughout the year they issue updates when the schedule has to alter. If you are using the extended set of route types you can also define the route as purely a school bus, i.e. not for general use.
Thanks for sharing your excellent work. I have tried to download and compile CommuteStream on a Linux machine. But because it requires a very old openssl library, I couldn’t compile CommuteStream successfully. Could you tell me how to compile it successfully? And if I need GTFS data of 2017 and 2018, can CommuteStream fetch older data like them?