Monday, December 1, 2008

Why did my server stop responding

Hey there, time for yet another discourse on some annoying thing or other that's happened to me lately. This time it was my web server yo-yo-ing (is that how you spell it?) between a responsive state and an unresponsive state. Well you say, there are many reasons that this could happen, it could be anything from a rogue agent to a faulty NIC, and right you would be. Thus began the process of elimination:

1. Check that there isn't a scheduled agent that is going nuts - nope.
2. Check that there isn't an agent on another machine that is hitting the server - nope
3. Check it's not the indexer task - nope
4. AMgr - nope
5. Any bizarre stats - wait, what's with all the http connections?

So, that was the first real clue. Whenever the problems started, it seemed that the HTTP Connections were building very fast from a small number - say 10 or 20, right through the roof to a max of 600 (the server limit). Then, just as suddenly they would disappear and everything would come good. Turned out that that was because they were timing out. We also couldn't figure out why the timing seemed to be random. Sometimes it would happen every 10 or 15 minutes, and sometimes it could be a day or so between events.... Very frustrating.

So, looking at it, the big clue was http connections. Alright then, let's look at some http and firewall logs. Well, it seems there are lots of connections (it's a very, very busy server which hosts 30+ websites), but there is nothing that really stands out. There is no single ip address, or even a range of addresses that is an order of magnitude larger than the others. 

As it turned out, it wasn't the number of connections for a particular address, it was that combined with what it is they were doing. What was happening was that some developers (I won't name them or their company here, but can confirm that they had absolutely nothing to do with my company, they were just using us as a good source of data) were testing some code which downloaded some (reasonably large) PDFs and then matched them against an existing baseline to find changes - or something like that. That code was implemented on 5 or 6 development machines, and was supposed to run once a day. Unfortunately whoever set it up "accidently" set it to run every MINUTE. So, that's why it was so random in the timing. The developers weren't running the code the whole time, sometimes they were on leave, or logged off, or running a different VM instance or whatever.

So, what did I learn from this - nothing much. I suppose the big one was that if you know there is a problem, and you can't find it, just keep digging - you will be right and people will think you're a genius, or you will be wrong - either way at least you're something.

This led me to wonder what's the most annoying, tedious, hard to find tech problem that you out there in reader land have had. Leave a comment and let us know.

Saturday, November 15, 2008

LDAP and Sametime Slow

Sorry it's been a bit of a while, but I've been on holidays and haven't had the time to update the blog. Anyway, just another curiosity with the sametime server setup. I moved the server from pointing to a test ldap server running on VMWare with Win 2003 (32 bit) and Domino 8.0.2 to an old LDAP server we had in production running (believe it or not) Windows NT and Domino 6.5.4 it works, but what seemed to be happening is that the sametime LDAP service was slowing down the other things that were using the LDAP. For example, we also have a Lotus Workplace Collaborative Learning (WCL) server running which uses that LDAP for its authentication services. The login times for that product went up dramatically.

Anyway, I pulled the sametime back to the test box and everything went back to normal. We are setting up an 8.0.2 LDAP server to replace our old NT one, so it will be interesting to see if that has the same problem when the other apps are pointing to it. I'll try to keep you posted.

Monday, October 27, 2008

Getting Sametime and Quickr working together

Sametime and Quickr - working together.

I've spent the last few days trying to get Sametime and Quickr to talk. It's one of those things that I've been meaning to look at for a while, but I finally got some VM space to set up a new Sametime 8.0.1 server in the same internet and Notes domain as my existing test Quickr 8.1 server. Great I thought, just pop in sametime and click the buttons, Bob's your uncle, it should be up in an hour or so. Worst case, I'll have to look at this (which is a great resource for anyone wanting to do this, I wouldn't have gotten as far as I have without it).

I had an existing sametime server, so what I wanted to do was just shut it down and reuse the server id on my new VM. So, away I went. New install of Domino 8.0.2, used the same server.id fired it up, worked like a charm. Shut down and install ST 8.0.1, fired it up - nightmare. I was getting bootstrap errors in the admin servlet. I hunted through the servlet.properties file and the domino.properties file, to no avail. I just couldn't see anything wrong. Anyway, I went off to the trusty ST forum on notes.net and found this 

http://pdcvsdevdwb01.ca.vic.gov.au/LotusQuickr/qsite/Main.nsf/h_RoomHome/4df38292d748069d0525670800167212/?OpenDocument

After following it through I managed to tame the beast and got it up and running. I'd really like to know how those lines went missing in a new install though!

Following that I went to the existing quickr server and set up it's bits to point to the new ST box. No joy. I kept at it, and eventually I got the magical green dot next to my name on Quickr. It's taken hours and lots of fiddling. It was all working great, but now it's stopped again. 

Next thing is to get all the existing pilot sametime users pointing to the new box.

Wednesday, October 22, 2008

Welcome to Domino Delivers

Hi there and welcome to the Domino Delivers blog. I suppose a little bit about why we're here might be in order.

My name is Nigel Roulston and I'm a Lotus Notes and Domino guy. There you go, I said it, loud and proud!

I've been an avid reader of and occassional commenter on the Lotus Blogosphere for a long time and I thought it was probably about time to put in my own $0.02.

I started working with Notes a long time ago - far longer than I really care to admit - way back when there were paintings on the cave walls and the web wasn't even a glimmer in Tim Berners-Lee's eye. The Notes version was 2  and I'd never seen anything like it. Lotus, yes actually Lotus, not IBM, was owned and run by Mitch Kapor, 1-2-3 was the flagship and Microsoft had just released Windows 3.0 or 3.1 or something. At that time I was working with the Australian Bureau of Statistics as a grunt level programmer hacking C code and PL/I. We very quickly moved from 2 onto Notes 3 (2 was never rolled out in the organisation), and we were at the time the largest installation of Notes in the world. Boy are things different now!

Since then, I've worked in private and government organisations as an in-house programmer, a programmer for hire, a consultant, and in management, so I've seen the good and the bad of most things Notes.

So, what's this blog going to be about? I'm not really sure at the moment. I think it'll be a mix of some Notes/Domino techy stuff along with a bit of general discussion. I'll try to keep the personal stuff out of it, but I'm sure some will creep in somewhere along the way. I might even use it as a forum to link to some of my other interests like photography, but on the whole, I'll try and make it pretty notes and domino centric.

Well, I think that's enough for a first post. Now I'm off to ponder what I'll do for the next one.