Inferring delays from Arrival Predictions

maxfishman · May 28, 2022, 2:40pm

Hi everyone,

I’m currently undergoing a university project in which I am building an app to predict bus delays via stats/ML using arrival predictions as data.

The only way I can think of inferring a delay is to compare the current expected time (from the arrival predictions API) to the timetable for that same line.

I am curious if anyone has any other ideas about strategies for inferring delays or can point me to some papers about this.

Your thoughts are greatly appreciated

Best,

Max

briantist · May 28, 2022, 3:41pm

@maxfishman

I guess it’s what actually you mean by “a delay”. There are three possible definitions in London:

A variance from the published timetables - not matching up to the pre-published data in https://tfl.gov.uk/tfl/syndication/feeds/journey-planner-timetables.zip
When a bus slips from the published intervals - when the public timetable says something like “every 3-5 minutes”;
When the countdown display at a bus stop shows a time due and the bus actually takes longer to arrive.

Personally I think I get wound by the third type.

Whilst the first data set is probably the one that TfL would use against one of the contractors, it actually hard to match up the live data against the timetables as is up the the bus service contractors to manage their own fleets and can move buses (using their registration place as an index) against any service they have to provide.

The second one is actually hard to do as well as the “published intervals” are generally shown only AT bus stops, but can be inferred from the timetables as I’ve done here…

To do the third one you might find out something useful - where on the road network you find unplanned (un-timetabled) bus delays.

It’s quite possible to grab the whole of the bus network using a loop though https://api.tfl.gov.uk/Line/_____/Route to get all the bus routes and then seeing on a minute-by-minute basis which stop predictions have changed - the prediction data is a time not the “2 mins” on the display - so watching for changes to predictions should shake out changes that are delays.

There used to be a http digest http://countdown.api.tfl.gov.uk/interfaces/ura/stream_V1 but that’s no longer something you can sign up for.

briantist · May 28, 2022, 3:46pm

Sorry the endpoint is https://api.tfl.gov.uk/Line/{ids}/Arrivals see APIs: Details - Transport for London - API

maxfishman · June 6, 2022, 10:50am

Thank you very much for the detailed explanation.

I have a question regarding the pre-published timetable you attached. Is there a way to request an updated version of that timetable, as I noticed that one is for May 30th. Also, what is the difference between the pre-published timetables and the ones I can get from the TFL API (https://api.tfl.gov.uk/Line/{id}/Timetable/{fromStopPointId})?

Regarding delay strategies, I also agree that the third option is the most interesting. Just to clarify, I would ping the TFL arrivals API at a consistent rate to get a series of ETAs, and then if the ETAs don’t decrease as they normally do or increase as time passes, then I could deduce a delay?

Finally, I was wondering if there was a way to check when the bus actually arrives at a station. With arrival predictions, I just assume that the timestamp of the final prediction within the set of predictions ids is the actual arrival time. In other words, when the predictions of a specific bus at a specific station end is the actual arrival time. However, I’m not sure if this is an appropriate assumption.

Thanks again.

mjcarchive · June 6, 2022, 4:54pm

@maxfishman
I am not clear which pre-published timetables you are referring to, the ones in the zip file (which are complete timetables) or the stop-specific timetables (as for 11 and 23 in Brian’s post) which only give approximate frequencies for high frequency periods of operation and are best thought of as summaries. Nor whether the ones you get from the API are complete or summary.

The “high frequency” point is important as TfL monitoring of those is basically of the service intervals, not whether a bus is where it should be. In theory every bus on a route could be 20 minutes late but the service intervals (and thus the service as perceived by passengers) could be fine. Such a situation might indicate a substantial delay but the delay may have happened two hours previously and subsequently cleared.

While they are related to some extent, I am interested in whether my bus is going to arrive at my stop
when it is supposed to but also in whether I am likely to be delayed once on the bus. In some ways the latter is more important as I have a range of apps which tell me when the bus is coming with pretty good reliability. In an ideal world a screen on the bus would mimic what motorway signs do and cycle though information like “North Finchley in 15 minutes”, all based on current traffic conditions. Well, I can dream…

Low frequency (less than every 12 minutes) services are monitored in the traditional way, against where the bus should be according to the timetable.

While they are probably not useful to you, as they are not directly reusable electronically and only show major timing points, there are other sources of timetables in matrix form. TfL publish every single one of their bus working timetables in pdf form - see Bus schedules - Transport for London. There is also the London Bus Routes site which contains html and pdf timetables. They come closest to the timetables TfL used to post at stops before they decided that the London public was only capable of digesting times at one stop - see London Bus Routes.

maxfishman · June 8, 2022, 1:39pm

Hi Michael,

Thank you for your explanation.

I agree about the importance of knowing the likeliness of delay after getting on the bus. I will try to use the arrival predictions with forecasting models among other things to generate more accurate journey predictions…hopefully.