I’m now working on a project that uses real-time data from many public transport operators to enhance the commuting experience in London. Even though I’ve looked into a couple of the existing APIs, such as TfL’s Unified API, I still have a few queries and ideas that I’m hoping this community can address.
Real-Time Disruptions Data: Although disruption data is accessible, I’ve found that it occasionally lacks the detail required to make wise decisions. Is it possible, for example, to get more precise information in real-time regarding particular disruption kinds or causes (such as signal failures, accidents)? This would significantly improve my application’s predictive power.
Crowding Data: While crowding information is currently available through the API, I’m wondering whether there are any plans to increase the data’s coverage or accuracy. Getting crowding levels for buses or DLR that are comparable to those for the London Tube is something that really interests me.
Use of API Rate Limits: Is there anyone who can talk about how they’ve managed API rate limits when growing an application? I’m thinking about developing a service that might be highly used, so I want to make sure everything runs smoothly and there are no obstacles in the way.
Future Plans: Will TfL be releasing any new data sources or API functionality in the near future? It would be beneficial to keep up with these advancements in order to plan for future project improvements.
There are usually two codes for each train - one for the cancellation reason and one for the delay reason. The former might be caused by the latter. If you have a rid, there is a Darwin Historical API that can give you the codes for TfL train in for several years in the past.
Crowding Data
The data is 96 datapoints for each station, per day. Every 15 minutes has a relative business count. 100% = maximum for the given station, 0%= quietest for the given station.
But yes, it doesn’t deal with the Notting Hill Carnivals and soccer matches which is a shame.
Use of API Rate Limits
Yes, these are VERY easy to hit when you’re using tests in an app. I personally use either a Postgres jsonb to cache these where they won’t change much (like the Crowing Data) and also use memcached on the server to keep a local copy.
Hello and welcome to the forum! Thank you for your feedback and suggestions.
We would love to provide more detailed disruptions data and are certainly exploring ways to achieve this. If we provided data similar to what @briantist mentioned (i.e. delay and cancellation reasons), would that meet your needs?
This is also something we would like to improve on. Crowding data for buses would be rather different to that for London Underground, but it is something we are actively looking at.
For services that will get a lot of use, you might like to cache the data within your infrastructure. That way, if multiple users request the same data, you’re only making a single request.
We constantly look to improve our open data offering. Some of them are relatively small things, and a lot of work goes into reflecting the evolving transport network (e.g. the changes we’re making to support the London Overground line naming). But we also have a long-term plan to ensure our data continues to be fit for purpose for all our needs, as well as the needs of open data users. I don’t have anything to announce at this time, but any changes are communicated on this forum or our blog.