Server Crashing - SOLVED - We Think


#1

Game mode: Online Private Server - No Mods
Problem: Random Crashes For The Past 2 Weeks

** WE THINK ITS BEEN RESOLVED ON OUR SERVER **

Since the patch just before the MOAP rolled out, our private server had been crashing 5-6 times a day.
It was totally random, didn’t matter how many players were connected or the time of day.
Going through the event logs in the server, each time seemed to be caused by a different error.

Various suggestions had been put forward as to the cause with the biggest recognised cause being the boss in the volcano. Others speculated about placing thralls into a wheel, our own observation regarding the destruction of buildings and a few others.

The evening of the patch to fix the volcano boss issue (10th), our server was still crashing after the patch went in, there seemed to be no end to it.

I had been running the database checker tool before each boot up with Zero errors found.

Then…

One of our players noticed a big problem with his base, a huge tower base was half destroyed for no reason. So we decided as a community that we’d roll back the server from 10pm to 9pm GMT to see if we could restore his building.

At first we closed down the server and just renamed the game.db file from the one backed up 1 hour previously, this did not restore the building and everyone was still in the original positions. We tried another file with the same result.

On the 3rd attempt to roll back the server, I deleted all 3 of these files:
game.db
game.db-shm
game.db-wal

and renaming just the 8.55pm backup to be the new game.db file.

Booted up the server, which rebuilt the game.db-wal and -shm files and the rollback was complete.

To our surprise, our server has NOT crashed in 3 days and the server has been busier than ever.

It would be nice to know what the -wal and the -shm versions of the game.db file do, I suspect they are some kind of cache but if anyone knows, please let me know.

I hope this helps someone else fix their server and maybe help Funcom find where the issue may be.


PC Hotfix Build 103818/18696 (10.07.2018)
#2

Here’s the explanation of what they are:

https://www.sqlite.org/tempfiles.html

Also, kind of relevant post in stack overflow, in respect to backups and stuff ->


#3

Thanks very much Leeux, very helpful links.

Reading that these files are just temporary files as I suspected, they hadn’t been closing down or deleting themselves on our server as the server crashed.

From what I’ve read, this shouldn’t have mattered nor affected our main DB either, and deleting them shouldn’t have made any difference to the stability of our server.

It has resolved our crashing problem though…


#4

It could be possible that if those files aren’t in sync with the main database, the database would return invalid information on read that confuses the server and makes it crash.

From what I understood reading this[1] it seems that in the WAL mode, the database has a portion of the actual data inside the WAL file, and only becomes permanent (i.e. part of the actual db) after a checkpoint, which the server seem to do periodically.

Perhaps if the WAL file wasn’t not in sync with the DB file (maybe, after a crash the DB file was somehow outdated, or corrupted… or when you restore a backup, it seems to be necessary to delete those files) the data read by the server triggered a crash due to inconsistencies within the database itself.

That’s all guess, though… without access to the server’s source code we cannot ever be sure what the hell is happening :smiley:

[1] => https://www.sqlite.org/wal.html