Archive › April, 2011

Important Update Released for Nerds Backup

KineticD recently released an update for the NerdsBackup program which caused NerdsBackup to stop and the Start Service button not to work.  Only clients who have Block Out hours enabled are affected.

An update to resolve this issue was quickly released but it will be necessary to force the update (on affected machines only) by doing the following:

1. Double Click on the Diamond Shaped NerdsBackup Icon.

2. Click on Options

3. Go to Block Out Tab

4. Take off all Block Out hours

5. Shutdown NerdsBackup

6. Restart NerdsBackup

7. Hit Start Service button

8. The software will request update

9. Choose Block Out hours again

We apologise for any inconvenience that this has caused.  For further assistance with this or anything else related to NerdsBackup please email backupservice@nerdsonsite.com.

Comments ( 0 )

Sony admits utter PSN failure: your personal data has been stolen

We posted this story to Facebook and Twitter earlier today, but it’s worth repeating.

Sony has finally come clean about the “external intrusion” that has caused the company to take down the PlayStation Network service, and the news is almost as bad as it can possibly get. The hackers have all your personal information, although Sony is still unsure about whether your credit card data is safe. Everything else on file when it comes to your account is in the hands of the hackers.

Sony had this to say regarding customer credit cards:

“Their advice is to be safe, rather than sorry. “If you have provided your credit card data through PlayStation Network or Qriocity, out of an abundance of caution we are advising you that your credit card number (excluding security code) and expiration date may have been obtained.”

In other words, CANCEL your credit card and get another reissued, because the possibility exists your information was stolen, and when it comes to these sort of situations, you don’t want to take a chance.

If you would like more information, read the Ars Technica article here or the CNET article here.

Comments ( 0 )

Nerds On Site Emergency Response System

Previous to last week’s widely-reported Amazon outage, we used email for delivery of outage alerts and reports to affected team members worldwide as well as clients.

Since part of our mail infrastructure lived at the N. Carolina Amazon Data Centre, those alerts failed, and only those of you who either kept up with our twitter accounts or blogs were informed.

We recognize the need for push delivery of not just outages, service windows, infrastructure issues of our own, but also of those we are connected to in the cloud, including Amazon, Google, etc.

To this end, going forward we ask that you follow this process outlined below in order to be notified of systems outages.

Our commitment is to provide NormalSpeak alerts to outages of any Software-as-a-Service solution we provide as well as related services that may affect our clients and team members worldwide.

For this alerting to take place, we have chosen Twitter, and a brand new account that will be used specifically for alerts only. This account is:

http://twitter.com/nosalerts

The rest of this article will go through the details of how to enable SMS delivery so you are alerted by a text message right to your mobile phone rather than depend on email only for alerts. Here is the step-by-step process if you have never used Twitter.

  1. Sign up for a twitter account at http://twitter.com/
  2. Log into your newly-created (or existing) twitter account
  3. Go into Account Settings and then the Mobile tab as shown here and enter your country and mobile number, then click Save:
  4. Verify your phone as instructed on-screen:
  5. Once your mobile phone is verified, specify the kinds of text message notifications you wish to receive: these below are good preferences to choose for the purpose of receiving important alerts:
  6. Finally visit http://twitter.com/nosalerts and enable the text-messaging option so it looks like this:
Comments ( 0 )

REMINDER: Upcoming IMAP Port Change

This is a reminder of our upcoming IMAP Port Change. Starting at 10PM ET on May 31, 2011, our mail servers will no longer accept IMAP connections that are not secured by SSL. All our IMAP clients will need to be using SSL secured IMAP on port 993 in order to continue using our IMAP service.

Please contact hosting@nerdsonsite.com if you have any concerns or questions about this change.

Comments ( 0 )

Amazon Outage and How We’ve Been Affected

Our on our Hosting Blog, Jonathan (Online Services POD Leader) has a very detailed account of what happened with Amazon, and how Nerds On Site services were affected. Here are some excerpts from the article:

“[On early Thursday morning], Amazon’s cloud server (known as AWS) suffered a major and ongoing outage that took down many small and large web services around the world.  The NPR has a great, easily understood explanation of what happened…”

“By [Thursday] afternoon, we had restored all mail services.  Because we were forced to rebuild our mail systems from a day-old backup, a few clients may experience a few lost emails. This is unfortunate, and we do apologize for this inconvenience.”

“This epic event showed our team that our basic recovery plans were sound and did allow us to recover relatively quickly from this outage.  In addition, we have used this event to make adjustments to our disaster recovery procedures to ensure that the next time such an event occurs we will have significantly less downtime.”

read the rest of the article here (will open in new tab or window)

Nerds On Site hosting clients can email hosting@nerdsisp.com for support with their hosting, as well as any questions or concerns, though we strongly encourage clients to get in touch with their Nerd first, who can communicate with our Hosting Team on your behalf.

Comments ( 0 )

Temporary Hosting Support Contact Info

For those following our status page or our Twitter account, you will know that all shared email services were restored yesterday afternoon after the Amazon AWS outage caused so much damage. Unfortunately, many of our corporate systems (including @nerdsonsite.com email) still do not work, and thus many of our standard support routes do not work.

The Nerds On Site Hosting Team has setup a temporary email address so that all hosting clients can email us for support. You can email hosting@nerdsisp.com right now if you have questions.

Comments ( 0 )

The Amazon Outage…And the Lessons Learned

At 1:49am EDT yesterday, Amazon’s cloud server (known as AWS) suffered a major and ongoing outage that took down many small and large web services around the world. The NPR has a great, easily understood explanation of what happened:

Major websites including Foursquare and Reddit crashed or suffered slowdowns Thursday after technical problems rattled Amazon.com’s widely used Web servers, frustrating millions of people who couldn’t access their favorite sites.

Read the rest of their article here: http://n.pr/fAMEoG. At the time of this writing, thousands of sites are still affected, as Amazon still has not resolved their issues. Interested clients can visit the Amazon status page directly to see a current status report of the Amazon issues.

While the Amazon outage did not affect any of our hosted websites, nor any of our dedicated or cloud server clients, it did take down our shared mail systems. As we employ third-party monitoring on all of our systems, our team was immediately alerted to the Amazon issues, even before Amazon itself acknowledged there was a problem. We immediately began working with Amazon to find the issues, and quickly discovered that this was not an issue isolated to our mail servers, but instead to wide swaths of the Internet. At that point our team had no choice but to wait on Amazon to provide the expected resolution.

When it became apparent that the problems at Amazon were getting worse, and not better, our team began reviewing our options, and decided to rebuild our mail systems from our last backup. We keep regular nightly backups of our mail systems in a different physical location with Amazon, and thus our backups were unaffected by the Amazon outage. By yesterday afternoon, we had restored all mail services. Because we were forced to rebuild our mail systems from a day-old backup, a few clients may experience a few lost emails. This is unfortunate, and we do apologize for this inconvenience.

Some clients may wonder if there was anything that we could have done to prevent this Amazon outage from affecting our mail systems. While the exact cause and after-action report from Amazon is still not available (they are still trying to fix the problem), some facts are known, and in the balance, the general consensus among Internet experts is that this Amazon outage was unprecedented and was nearly impossible to plan for:

This morning, multiple availability zones failed in the us-east region. AWS broke their promises on the failure scenarios for Availability Zones. It means that AWS have a common single point of failure (assuming it wasn’t a winning-the-lottery-while-being-hit-by-a-meteor-odds coincidence). The sites that are down were correctly designing to the ‘contract’; the problem is that AWS didn’t follow their own specifications.

You can read the rest of this informative article here: http://bit.ly/hPOjkH. However, just because this event was unprecedented and a complete failure of Amazon doesn’t mean that our team didn’t learn valuable lessons. Frequently, it is the largest of failures that teach the best of lessons.

This epic event showed our team that our basic recovery plans were sound and did allow us to recover relatively quickly from this outage. In addition, we have used this event to make adjustments to our disaster recovery procedures to ensure that the next time such an event occurs we will have significantly less downtime.

  1. Our team will begin taking more frequent backups of our systems. This will allow us to recover from a more recent backup should such a failure even occur again.
  2. We have changed the way we build our mail systems to allow for a quicker recovery in future events. In this case, it took our team 5 hours to rebuild the mail systems; with these changes already implemented, we expect that future re-builds will take less than an hour.
  3. We have updated our disaster recovery documentation to reflect the lessons learned during this outage to ensure that our team has the latest and best procedures to follow for future events.

While our team is very disappointed that our clients had to suffer through another outage, we are pleased that we were able to recover services relatively quickly. As we write this, over 16 hours since we recovered our services, major sites like reddit.com are still not fully recovered. As always clients can watch our Trust Site for details on current uptime, and subscribe to our Twitter feed for regular updates for all problems.

Comments ( 0 )

Enterprise Email Solutions

In the aftermath of the latest email outage caused by Amazon’s epic outage, some clients may be wondering what type of email solutions they can consider that will have a significantly greater reliability than a basic shared email solution.

Nerds On Site does offer a number of enterprise-grade email solutions, none of which were in any way affected by the recent Amazon outage. Our team can customize a solution for your business needs, with options for enterprise grade spam filtering, anti-virus filtering, archiving, collaboration and mobile support.

Contact your Primary Nerd to learn more about the solutions we can offer, or call us 24×7 at 1-877-MY-NERDS.

Comments ( 0 )

Amazon Issues

Like thousands of companies around the world, we host many of our cloud services with Amazon. Currently, Amazon is experiencing crippling issues in their US-East datacenters, which is affecting thousands of websites and services around the globe. While all of our client websites remain up and unaffected, our client email systems are down because of Amazon’s issues. Amazon is aware of the problem, and is working to resolve this.

Comments ( 0 )

Amazon Still Down…And Counting…

For 12 hours now, Amazon’s cloud service (known as AWS or EC2) has been down, taking with it thousands of some of the most popular web services in the world. This outage has affected our services as well, resulting in the outage of our client email systems. Unfortunately, the Nerds On Site team has absolutely no control over this, and is in the same boat as thousands of other web teams around the world.

12 hours into this outage, Amazon is not able to provide an estimated time for a fix, saying:

“A number of people have asked us for an ETA on when we’ll be fully recovered. We deeply understand why this is important and promise to share this information as soon as we have an estimate that we believe is close to accurate. Our high-level ballpark right now is that the ETA is a few hours. We can assure you that all-hands are on deck to recover as quickly as possible. We will update the community as we have more information.”

Our team continues to monitor the situation, and we’ll keep you updated via this blog and our Twitter account: http://twitter.com/nerdshosting.

Comments ( 0 )