Getting intermittent out of memory errors for Journey Planner API

markw · May 26, 2020, 1:24pm

Hi,

I’m seeing fairly frequent but intermittent 500 errors with an out of memory exception for fairly simple journey planning requests. Is this something that is known and being resolved?

Example:

data: {
      '$type': 'Tfl.Api.Presentation.Entities.ApiError, Tfl.Api.Presentation.Entities',
      timestampUtc: '2020-05-26T13:09:53.8667719Z',
      name: 'Internal',
      exceptionType: 'OutOfMemoryException',
      httpStatusCode: 500,
      httpStatus: 'InternalServerError',
      relativeUri: '/journey/journeyresults/1000129/to/51.454693,-0.119331?mode=overground,tube,dlr,walking',
      message: 'An internal server error occurred.'
    }

briantist · May 26, 2020, 3:49pm

@markw

Hi. I was wondering what your start point might be as there’s no ‘from’ in the listed same above. I think I’ve only managed to do this when I had a project that was hitting the Journey Planner too often. Are you making frequent calls?

markw · May 26, 2020, 3:54pm

The from is in the URI - it’s an ICS code:

This was not happening before, it seems to have started sometime since the API downtime last week.

The app is not launched yet and there are only 3 people using it ever, not particularly frequently and interactively, so if that’s too many calls we have a serious problem!

briantist · May 26, 2020, 4:17pm

@markw

I’ve just tested it out with this form of the call (with my keys)

https://api.tfl.gov.uk/Journey/JourneyResults/1000129/to/51.454693,-0.119331?nationalSearch=False&timeIs=Departing&journeyPreference=LeastTime&walkingSpeed=Average&cyclePreference=None&alternativeCycle=False&alternativeWalking=True&applyHtmlMarkup=False&useMultiModalCall=False&walkingOptimization=False

and it seems to be working OK for the few calls I’ve made.

markw · May 26, 2020, 4:20pm

Yes, it really seems to be intermittent. I can’t pinpoint a particular route or anything about the routes that causes the problem so far, but I have a whole bunch of them in my server logs.

briantist · May 26, 2020, 4:24pm

I always wrap my calls to this from a server in a 60-second cache (using memcached) to ensure that the server isn’t making too many calls to the tfl api.

A 500 from tfl means “you are using the API too much!” - and implements the code to prevent hammering of the api.

If you cache on the server for such calls, it can really help prevent this. I had to do this with the customer service thing I wrote for MTR Liz Line.

markw · May 26, 2020, 4:28pm

Responses for the exact same search will be cached (AWS handles this for me on the API Gateway), but I am 100% confident that is not the case here, as we can only be doing about 5-10 searches in a minute if we’re all using it at once. It sometimes happens the first search in a session.

Also the error specifically says it’s out of memory, it’s not a rate limiting error, if it were, why not return an appropriate code and description.

jamesevans · May 28, 2020, 11:35am

Hi @markw

We had issues where we have had an instance in our stack that has been struggling with memory issues. We usually reboot these within a few minutes, but the last couple of days, the monitoring has not been firing correctly so was missed for a few hours each time.

This should be sorted now, but please let us know if you see any sustained memory errors.

When we rate limit your API key, you should see a 429 (too many requests).

Thanks,
James