Bus working timetables - two different formats this week

@neamanshafiq

Now here’s an odd thing. The TfL Data Bucket shows some bus WTTs updated at 3am this morning and a host of others updated some time after 7am.

I can’t be sure if this is universally the case but it looks as if the two batches are in different formats. While the layout is broadly the same, the transit nodes used in the new format look quite different (an extra character, for example) while timing points at stands are denoted by the suffix T rather than S.

Extra (or revised?) day types are also shown, thus we have spSa for special Saturday instead of sSa (though sSa and SSa are still present). If the aim is to avoid confusion between specials and Summer, that’s to be welcomed.

Finally, the description of the reason for the service change is much more comprehensive, certainly better than the frequent use of the single word SCHEDULE!

All these changes have a potential impact on the processing that I do. I doubt that the format/content would have been changed deliberately without giving some warning so quite possibly the new format is not meant to be in the wild yet.

I don’t want to waste effort on processing the hybrid set at this stage when its stage might be inadvertent. Would it be possible to clarify what is supposed to be in the WTT dataset now and whether there is any reversion work to be done? A summary of the format/content changes that are being amde would also be welcome; I’m sure I have missed some.

The timetable pages are in a different order too, all out followed by all in rather than then previous hokey-cokey ordering.

More changes emerging.

The most aggravating feature is that the new format files have lost what is to me an absolutely key piece of metadata, namely the file title (as opposed to file name).

For example, the file Schedule_100-Bx.pdf has a Title
Standard Schedule 100-59733-Bx-LC-1-1
which contains the Service Change Number, operator, option and version. This was programatically accessible with the right software, which meant that I could easily identify new files. (Bear in mind that all files are recreated every week so the file creation date is useless for this). Takes well under an hour.

The new format file Schedule_100-MF.pdf has the Title field left blank. As far as I can see the Service Change Number etc can only be got at by opening the file. Maybe that can be automated but it still means opening 3500 files (this week) in order to find what may be a handful of new files. Goodness knows how long that would take - and right now I haven’t got any enthusiasm for finding out.

Other less important changes have emerged. The headnotes, footnotes and page numbers have disappeared. Timing point codes for garages have gained a G suffix (e g WN G instead of WN).

The extra character in the transit code node seems to be no more than adding an A to the previous code.

Two more changes worth noting.

First, route suffices such as 25U for rail replacement extras or 281R for rugby. While these are still present within the files they have been lost in the file name, thus they are not immediately distinguishable from the unsuffixed files such as 25 and 281. (25U is actually present now in both forms, the old 25U-sSu and the new 25-spSu. Just to add to the fun there is an old format 25-sSu as well.

I also note that the new files distinguish between options (presumably where different structures were possible until late on) but there is no “space” for indicating versions. Different versions have been much more common than different options. Where a Boxing Day file from 2020 is reintroduced unchanged for 2021 it is typically assigned a new version number and I think it is also used for minor error corrections. So there is no metadata at all in the file to indicate that a change has been made, thus that a file is new.

I appreciate that the WTT system exists for operational purposes and that public access is very much an optional extra. On some of this I do wonder whether the change I observe is what was actually wanted for the system.

As I commented in a recent post, my site has actually been called in aid by the FOI team when people have asked for historic schedules (and I dare say there are many more requests that have not been made because my site exists). If, as I fear, my site gets frozen because it is no longer possible to update it efficiently, that could have some blowback for FOI.

@mjcarchive I’m sorry to hear about the changes - my team don’t actually produce the timetables so i’ve passed your queries re the content of the files to the relevant team. i’m investigating the issue with the two batches with different timestamps. we have a daily transfer script running at 3am so it’s odd to see files with the later timestamp.

Thanks @neamanshafiq. The later timestamp set made me wonder whether it was an accidental release of something under development.

The key thing for me is a reliable way of identifying new files without having to open them all. That could be (as before) by populating the Title field. Not replacing existing files would be even better, as the datestamp would do the job. The loss of the version number worries me from the point of view of file naming as much as anything else.

I suspect that anything else can be coped with though it may need a lot of careful code rewriting (drat, that means I’ll have to understand the original logic of something which grew like Topsy)

@neamanshafiq

I’m starting to feel as if the gaslight is being turned up and down…

When I looked at the Data Bucket at around 7om this evening, it looked as if the files created/uploaded at 7am on Tuesday had been replaced by files created/uploaded at 3am on Wednesday. Assuming they were old format, I could start processing them after dinner.

What do I see now (9.15pm onwards). The Wednesday 3am files have been replaced by files created/uploaded on Wednesday at around 6am - and they are no format.

So, thanks to whoever reverted to the old format files but at the moment your work was in vain.

Michael

This morning (8.45am) it is a mix of 3am files, creation dates a mix of 30 November, 1 December and 3 December.

I think I have now downloaded a full set of “old format” schedules but it remains to be seen whether the new format will raise its ugly (well, beautiful but different) head later in the day.