And WAP dies a slow and terrible death. I opened the election service at 20:00 sharp yesterday and about 10 seconds later everything went, not down, but really, really slow. Then the MySQL server responsible for statistics recording and the serving of other parts of NRK‘s WAP service started to complain about a lack of available connections to the database. Max connections setting increased, service restarted, problem solved. After a while Tomcat wanted more threads to handle incoming connections. Setting increased, server restarted.
Unfortunately, this is when I shoot myself in the foot.
Because of the load on the database with the election results (which is not the MySQL database I’m administrating), it takes a while before a connection is established, data is returned and the connection is closed. The probably-not-that-optimised JDBC driver for the Microsoft SQL Server is rather large and making a new instance of this driver for every connection consumes a lot of memory. This wouldn’t really have been a problem if the communication with the database was completed in a few short seconds – which it of course did when I was testing the system earlier in the day. But with the huge load, more instances were made than the server could handle with it’s allocated memory.
Say hello to outOfMemoryException.
All this could have been solved in a couple of ways. I could have decreased the maximum number of connections to the server. I could have increased the memory allocated to the server. I could have made a connection pool instead of using a one connection per request scheme. I could have stress tested the application before it went into production. But at this point, the server was working at least semi-well, so I decided to restart the server whenever it hit it’s memory barrier. Restarting the server takes 2-3 seconds. In between this I was looking for a way to increase the allocated memory, but when I finally realized how dead simple it was, the load had decreased and there was really no need for any more memory.
Lessons learned: Connections per request. Bad. Connection pool. Good. Using that now. No stress testing. Bad. Stress testing. Good.