November 29, 1999
Volume 52, No. 13
Crisis over -- e-mail restored
After nearly four weeks, the continuing problem with e-mail service has been identified and solved, according to Paul Morris, vice provost for information technology.
"It took a lot of late nights from four members of Emory's UNIX team and finally some breakthrough information from Georgia Tech that led us to the final solution," said Morris. Part of that solution is the purchase of a faster disk subsystem that will handle and route e-mail messages to the servers.
The problems began in late October and ITD currently believes were a result of two factors: upgrading the operating system of the e-mail server to a Y2K-compliant version and the large volume of e-mail.
"When we upgraded to the 2.6 version of Solaris, it was more resource-intensive than we thought it would be, so it caused the server to run slower," said Morris. "In addition, the volume of traffic on the machine got to the critical point. What was happening is that so many messages were trying to get to the hard disk that the slower-running Input/Output system caused the disks to start locking up. Then the messages starting queuing up because the files were locked. The whole system starting slowing down in late October and then just stalled."
In late October, ITD called in the server's manufacturer, Sun MicroSystems, with whom Emory has a support agreement. Staff members in the mathematics and computer science department were also called in to help figure out what was going wrong.
"We initially thought it was a software problem," said Morris. "Early on, we thought it was the POP [Post Office Protocol] server generating extra traffic. That was when we encouraged people to switch from POP to IMAP [Internet Mail Access Protocol]. But that didn't help either, and a good bit of time was lost."
It turns out that Georgia Tech encountered a similar problem last year. "Because of the expertise at Tech, they don't have a service agreement with Sun MicroSystems, so no one at Sun knew how Tech had fixed their problem," said John Cyran, manager of enterprise systems support. "Someone at Sun finally found out that Georgia Tech figured out it wasn't a software problem, but a hardware problem. Once they knew that, it took them a couple of hours to fix things."
"If we had known earlier about the Georgia Tech problem we could have avoided this crisis," said Morris. Sun has loaned Emory the faster disk subsystem needed to control the flow of messages on the server until the new subsytem arrives.
"The biggest loss during this e-mail crisis was lost customer time," said Morris. "I'm aware of the inconvenience that was caused by the lack of e-mail and regret it's taken us so long to solve this problem. Some faculty activities, such as an online journal, rely on e-mail services." Departments running their own e-mail servers were not affected, as the ITD mail relay service continued to work.
"After this crisis, it will take ITD a while to demonstrate reliability with e-mail services," said Morris. In addition to the new $70,000 disk subsystem, ITD has installed $20,000 of additional memory and plans to purchase a redundant stand-by machine in case the server fails. But that wouldn't have helped with this recent crisis.
One issue that emerged during the crisis-ITD's recommendation that people move from POP e-mail browsers to IMAP-is being rethought by ITD.
"Our thinking was that people would prefer to move to the IMAP protocol because they would be able to get to their e-mail from any machine," said Morris. "We heard from people on campus that it's not as important for people as we thought, so in the short run, ITD will not abandon POP as a service and has stablized both POP and IMAP."