Socket problems?!

Ask questions, request features, or just complement us about our software and services.
Post Reply
CactusRadio.com
Posts: 13
Joined: Mon Mar 04, 2002 1:10 pm
Location: Phoenix, AZ
Contact:

Socket problems?!

Post by CactusRadio.com »

I have been encountering a problem for many weeks now and just this week I seem to have narrowed it down to (at least be related to) RTB.

First:
Hardware: 500 Mhz PIII, 640MB RAM
OS: Windows 2000 Server - Native-Mode Domain Controller
Software: RTB (duh), WinAmp, Shoutcast Server...

After broadcasting fine for just over two days straight, RTB will stop connecting to the log service and starts thinking the local shoutcast server is not online. The Shoutcast Server starts getting "[yp_tch] error creating socket!" instead of "[yp_tch] yp.shoutcast.com touched!" consistently. I can't browse the web: I get errors opening a port. The event log starts generating UserEnv 1000 errors every five minutes, but (since I have checked/tried all available remedies for this) I've now concluded that these event log errors are caused by the lack of resources rather than being the cause of the lack of resources.

Rebooting the server clears up the problem, but only for another two days. Each time this happens, I've stopped using or have been killing different combinations of applications to see if I could narrow down the problem.

For the last two times this has happened, closing RTB and then reopening it has instantly fixed the problem: RTB reconnects on the first try, the Shoutcast Server touches start to work, the UserEnv 1000 errors stop, and I can browse the web with no problems. That's the only thing I changed on the server, and then everything starts working again...for another two days, anyway.

Since the pattern appears to be pretty clear, I need to ask for some help. Is it possible that RTB is creating a socket and it is not getting closed/destroyed...and over a period of days the system then gets low on resources/ports/sockets?!

Additional information: I believe my RTB .ini file setting says to check the servers every 30 seconds. I have only two servers in the list: the local Shoutcast Server, and one Shoutcast Server as a relay. (This relay has a dynamic IP which I update regularly. See other old issue.) The relay is often down for hours at a time because that computer is used for other things, too...

Please let me know if I can provide additional information or otherwise help in testing. Thanks!!!
User avatar
Jay
Will work for food (Administrator)
Posts: 3020
Joined: Mon Jan 14, 2002 12:48 am
Location: Next Door
Contact:

Post by Jay »

wow, that is certainly interesting. I have RTB running on a Win2k server machine and a WinXP machine and have yet to come accross something like that, however I do not have any failures in the connection to the server. I'll look over my code and see if I can pinpoint a possible cause.

What is the exact Status of your server's when this starts happening?
- Jay
CactusRadio.com
Posts: 13
Joined: Mon Mar 04, 2002 1:10 pm
Location: Phoenix, AZ
Contact:

Post by CactusRadio.com »

Crazy stuff, isn't it? I, too, suspect that it is because my relay is often down. Here's info from the only logs I have kept:

9/08 - 12:14am - rebooted, relay was down
9/08 - 03:40am - relay joins
9/08 - 07:36pm - relay drops
9/09 - 04:19am - relay joins
9/09 - 04:40pm - relay drops
9/10 - 06:05am - relay joins
9/10 - 07:42am - relay drops
9/10 - 02:14pm - errors begin
9/10 - 06:13pm - closed & reopened RTB, errors stop
(relay is down this entire time!)
9/12 - 03:26am - errors begin
9/12 - 05:03am - relay joins
9/12 - 06:39am - closed & reopened RTB, errors stop
(relay remains up)

Interesting that the most recent outage occurred quicker than all of the rest...and my relay was down the entire time...making me suspect more and more that a "down" relay is the instigator:
Quick time calculations tell me that my relay was down for about 33 hours from the time I last rebooted before the errors started. When RTB was closed and restarted, I got another 33 hours of relay "downtime" before the errors reappeared. Hmmm... 33 hours*60 minutes*2 queries/minute=3960 sockets/ports. Isn't the ephemeral port range 1024-5000? That's about the same number of ports!...and that may be "the wall" that I'm hitting, if the "Not Online" is causing a socket/port to be left open...

Please let me know if I can provide more help/information... Thanks!!
User avatar
Jay
Will work for food (Administrator)
Posts: 3020
Joined: Mon Jan 14, 2002 12:48 am
Location: Next Door
Contact:

Post by Jay »

yes very helpful, I think I have added the necessary cleanup operations to failed connections.

I am testing it now, and will let you know when it's ready

BTW you might want to join the discussion list at [email protected]

this is my preffered method of informing of small non published updates and patches
- Jay
CactusRadio.com
Posts: 13
Joined: Mon Mar 04, 2002 1:10 pm
Location: Phoenix, AZ
Contact:

Post by CactusRadio.com »

Okay. I'm on that. I just didn't know if everyone else wanted to have our conversation in their e-mail box. I'll go there from now on. BTW, my touch interval seems to have changed to 300 seconds. Is this part of your test, or a necessary change? That's messing with me a bit, as some of my songs aren't getting logged.
User avatar
Jay
Will work for food (Administrator)
Posts: 3020
Joined: Mon Jan 14, 2002 12:48 am
Location: Next Door
Contact:

Post by Jay »

yea having some issues with the server at the moment and trying to find a remedy. I have reset the touch rate to 30 seconds.
- Jay
Post Reply