The TransXchange files normally appear in advance of the change, which makes sense given that Journey Planner users could be planning a journey two weeks hence. The 414 file, for example, as been there for a couple of weeks. So I know the WTT currently there does not match. I’m not sure in any case whether tha basic course is the same, or - if it is - at what point they might diverge.
Oh yes, what do I do with the PDFs? I make them accessible on a website London Bus Timetable Graveyard
along with whatever earlier versions have been available by TfL in the past. The FOI team has actually directed thoese inquiring about old schedules to this site so I think it is quite useful to them that it exists. Without it they would have continued to get a never ending stream of requests for old schedules.
The website has links to the TfL site for current timetables and elsewhere for older ones. Trouble right now is that the latest WTT update seems to have overwritten the previous week’s new files with older ones, so the links to the current WTTs from my site don’t currently do what they say on the can. I could do the same as for the older ones but I should not have to rewrite the code to cope with that kind of system failure at the TfL end!
Having got the PDFs, I thought it would be “interesting” to see how much subsequent processing was needed to turn them into good old-fashioned timetables. in Excel. First step - OCR new files (they are not image PDFs) using a bulk process. Second step - write VBA code to reorganise the data, pick up stop names and so on. Then, having produced workable Excel files, compare with the previous version to see what has changed (in a lot of cases the actual times have not). Some of that feeds into someone else’s timetable website, which I know is quite widely used, though I know there are more important (and earlier) inputs into that site.
I skated over the first (OCR) step as that only works approximately in a lot of cases. However I was able to write VBA code which identified and overcame the most common OCR errors. A few need manual intervention - perhaps 1 in 40 or so - but it is usually obvious what has to be done.
All very sad, of course. Each step provided a significant but not insuperable challenge. I carried on because I did not want to be beaten! Like Hillary and Everest, I did it because it is there!
One final point worth mentioning For the oldest schedule, TfL made available a huge zip file of WTTs in XML form, going as far back as 2006 in some cases. These were relatively easy to read into Excel, without errors. I ended up producing PDF versions for the website.
The vast majority of this automated. It’s not like processing machine readable live feeds but a lot can be done with older technology sometimes.