We will be implementing this as a phased approach starting at 10.30 tomorrow morning (Thursday 26 January). Between 10:30 and 16:00 we’ll turn this on, and will be monitoring its performance.
If this goes as planned, we will make the permanent switch on Tuesday 31 January at 10:30.
To prevent any issues with using our data feeds, please could you make sure you have updated your tooling to support the versions 1.2 and higher of TLS by 10.30 tomorrow morning.
It may be a coincidence but at about the same time this change took place the HTTrack spidering software I used to identify new or changed data files ceased to work. Googling the issue online it seemed that HTTrack was not consistent with TLS v1.2 or higher (not that I properly understand what that means) Not the end of the world but it would be helpful still to be able to identify new material without going through the lot! The sources I am primarily thinking of are the bus spider maps, of which there are a thousand or so, and the comparatively new bus working timetable facility, which contains several thousand files.
@mjcarchive thanks for the information. Can you describe your use-case for spidering? We’d like to avoid developers needing to do this kind of polling, crawling or scraping if we can, so if there’s a particular sort of notification you require we might be able to help out. For example, there is a mailing list for timetable/reference data updates that could provide a suitable trigger to visit our Unified API.
Thanks for the reply. I would not (yet) describe myself as a developer in any meaningful sense! In a sense this is more about what computer-literate non-developers might do with resources such as collections of pdf files, which might be update weekly, or perhaps only occasionally. Any particular update might replace or add just a small number of files, I’m not sure this is the best place to to be raising this, as it is not about interacting with the API, but here goes anyway.
An example is the collection of spider maps at Bus spider maps - Transport for London. These could be linked to from community websites, for example. A cursory inspection of the pages for individual boroughs shows that the naming convention is that there is, er, no naming convention! Possibly because different potential user needs pull in different direction:-
Some include a date, though not necessarily in the same format each time, some do not, If they include a date then any link set up from outside is going to fail when a spider map is replaced because the file name will have changed.
On the other hand, including the date in all file names as a matter of convention makes it easy for anyone looking at the page to see whether the file they want has changed without needing to download it. Then you get the geek community (guilty, m’lud) who may want to keep copies of old files as well as download new. Hence the interest in being able to spider the pages.
Of course, simply putting up a directory listing somewhere would enable new/changed files to be identified quickly, even if the file names are unchanged, as long as the true creation dates are shown.
Which appears not to be the case for the second example, the bus working timetables available via Bus schedules - Transport for London. These do keep the same names when replaced, unlike the spider maps, but for anyone who simply want to know what is new, it is “needle in haystack” time. This resource is updated weekly and it appears (from downloading) that files always have the date of the latest update as the creation date; in other words the entire resource appears to be recreated every week. The metadata for each PDF file do contain information that shows a file is new (the Title field under PDF Information under Properties) but to access this you have to download the file anyway! Any directory listing would have to include this piece of metadata to be of any use.
I am not suggesting that there is a compelling business need to address either of these examples. But I do feel sometimes that the more old-fashioned ways of getting at and using data get overlooked as more and more sophisticated methods are developed. I’m a great enthusiast for the thousand flowers which open data have enabled to bloom but that can embrace low as well as high tech approaches.