I am a Master’s Degree Student in Statistics, working on the TFL Network for an important project.
I have trouble using the API tools and would like to request your help to access this data :
Travel time between all pairs of stations (or at least adjacent ones)
Distances between adjacent stations
Correspondance between Route_id and Stations Crossed through the route.
Also I would like to use this data on R, so I would need a format that R can manage .
As I said I have lots of trouble using the API tool as it is my first time dealing with this type of content.
Could you explain me in more detail how I could get all the data I am searching for ? I can only get them one by one for now…
@TomT
I am no expert on this but some time ago I found a website which you may find helpful. It seems University of Leeds are (were?) working on similar projects to yourself.
This is a great resource! I had a question on this.
Some Journey links have WaitTime tag in it and some don’t. Could you please let me know how that logic works? Do the stations with no wait time tag mean a <1 minute wait time?
Hi. Where there is no WaitTime, it means there is no wait time built into the timetable. Of course, if passengers need to be picked up or dropped off, the service will still wait long enough for that to happen. However, in the case of request stops, the service may well simply pass by the stop.
This is not available in our TransXChange data. However, train operators are required to wait the station’s minimum dwell time, and only depart when the signal is green and it is safe to do so.
Is there a way to get the dwell time for a station? Or would I be better off assuming a minimum amount (say 2 mins) for each station unless there is a wait time parameter in the timetables?
I did a manual timing exercise as well with the good old stopwatch on my phone. Most stations have a dwell time of around 20-25 seconds and some bigger stations like Queen’s Park take around 40 seconds.
Would be a good approximation to model. Now my question would be which stations have the 20 second dwell time and which stations have 40 seconds
One method I can think of would be to take one of the “Common User Format” timetables, which are equivalent to the PDF Working Timetables but in a data-friendly format. While TfL no longer publishes new versions of these CUF timetables, old versions are still available at http://timetables.data.tfl.gov.uk and contain substantially more information than the TransXChange timetables that are still published. Something like a dwell time at a station isn’t likely to change very often, so this data may still be of use to you despite being a little dated.
In each compressed timetable, you’ll find a file ending in “EVT.csv”, which contains every timetabled “event”, 1 per row. By using the station ID, and arrival/departure times, you should be able to calculate an average dwell time for every station over the course of each day.
The data bucket linked above also includes a technical specification PDF document, which you can use to figure out the more technical aspects of what each column and the values within them mean.
You may find it useful to use intertube to collect this data, as then you can avoid doing the heavy lifting of converting TrackerNet’s individual platform arrival boards into individual trains that you can track across the line. You may in particular find the TRAIN-STATION-STOP and TRAIN-TRANSIT events useful to calculate the dwell times at stations.
You can find more info about the intertube API here, but note that you would need to contact @eta to get access. I’m sure that if you ask very nicely you may even be able to get a historical dump of data from her rather than waiting for TrackerNet to start working again.
Both of these methods should also assist with one of the objectives you mentioned in your original post, as you would also be able to calculate the travel time between all pairs of stations.
I’d also suggest looking at the PDF versions of the working timetables, as they each contain both the running time and the physical distance between all pairs of adjacent stations.