It’s unquestionably useful - one of the interesting things you encounter when working with Tube data is that you know where people enter and leave the systems but what happens between is a bit of a mystery. Having data on how loaded different routes are would be useful.
One interesting analogy I see is with network traffic management - unlike in a ‘normal’ network where the packet’s path is determined by some kind of central controlling authority in the TFL universe the ‘packets’ are sentient and make their own decisions about routing! The analogy gets more interesting when you consider that Tube ‘bandwidth’ is a wasting asset just like normal bandwidth - an empty seat on a train represents a lost opportunity for someone to travel somewhere.
Let’s say I have a conventional WAN with 3 nodes - London, Dublin and New York, all of which are connected. If the London-Dublin link gets saturated but the others aren’t the most efficient way to use the network would be to route new traffic from London to Dublin via New York instead of having it join the queue for London->Dublin. I would argue that the Tube network is the same in many ways - the optimal route for a given journey might change and become - at times - counterintuitive.
So to return to the question of how useful this data is: You could plausibly use it to build an app that offers users the chance to trade time for comfort by offering them different - and possibly weird - routes that use spare capacity instead of forcing people to play ‘sardines’ on trains.
Question: How accurate and predictable are load factors for trains? And once such an app is unleashed and gains acceptance how fast could it render a historical data set useless?
Question: How close is TFL to being able to provide train loading information in real time? As a developer if I have live data and last weeks data for the same day and time I can make predictions about loading with a straight face. Otherwise I’m guessing…
As for the full week’s Oyster data: Here at VoltDB we love large data sets with real world data. The problem with fake data is that you’re setting yourself up for what I call “Kung Fu Villain Syndrome” - nice, well behaved data that doesn’t represent that nastiness of the real world. We’ve found the 5% set very useful, but would much prefer a 100% set, even for a day…