Designing the Canthan Written Language

4ad10EoD-Screenshot-590x332.png

A Brief History of the Canthan Written Language

When we first sat down to design the Canthan language as preproduction on Guild Wars: Factions® began, we had actually planned to create both a written and a spoken language, and early documentation was authored with a full vocabulary, rules of grammar, and more. But as is often the case with ambitious creative endeavors, we soon realized that the level of work and fidelity that would be required to achieve our goals quickly grew far beyond the project’s scope, and so we scaled the plan back to supporting a written language of around one hundred symbols.

We aimed to create three sets of thirty-two symbols, with each symbol representing a word, concept, or idea (called a logogram) that would have the most usefulness to the world-building needs of the expansion. Logographic systems can have complicated and diverse usages, and we set out to keep ours very simple and clear so that it would put the minimum amount of strain on the team—both for the artists creating the assets that would use these symbols, and for the quality assurance testers who would need to vet their usage.

Stylistically we chose a calligraphic approach using variable thickness of strokes and an expressive, though somewhat rigid, aesthetic. This style was chosen to emphasize the nature of Cantha and its people—wonderfully spiritual and openhearted, but with a great respect for order and structure. Just as with the environment, which drew inspiration from a number of real-world analogues, Canthan symbols were not based on any one real-world language or writing system but were developed to feel more like a “cousin” to any of them, with its unique ancestry and evolution grounded firmly in the fictional world of Tyria.

When Guild Wars: Factions was released, the symbols that made it into the game (we finished sixty-four of the planned ninety-six) were well received by players as a wonderful way to immerse themselves in our fictional world even more, and this would inspire members of the development the team to continue pursuing written languages as a way of strengthening world building across the franchise.

New Canthan

As we set out to include the Canthan written language in Guild Wars 2: End of Dragons™, we realized that the differing needs of various teams would quickly increase the scope of our task far beyond what our small team of volunteers could handle, as well as place an unmanageable burden on our QA teammates. Menus, books, manuscripts, and the like needed blocks of text about many different subjects, which would not only require a staggering number of new characters to be created, but also a grammatical structure that would then need to be learned and double-checked—that is, if we wanted to stay true to the original vision of a translatable Canthan written language.

Enter: Non-translatable Canthan.

Non-translatable Canthan is based on hangul, the Korean alphabet, which uses a system of vowel letters and consonant letters to create syllabic blocks. This system was chosen as a relatively easy way to create many new symbols that looked distinct from those that are translatable and yet still a part of their family.

There are 5 vowel letters and 10 consonant letters:

bb0faVowelConsonant1-590x123.png

These must be combined in one of two ways in order to create words:

1. Consonant + vowel

6dac3ConsonantVowel-590x70.png

2. Consonant + vowel + consonant

3b52eConsonantVowelConsonant-590x70.png

While not as complex as hangul, which has additional combinations, non-translatable Canthan nevertheless produced enough new letters to allow us to fill Guild Wars 2: End of Dragons with a beautiful written language that felt grounded and representative of a forward-facing Cantha.

Creating New Words

While the non-translatable Canthan would do wonders for filling in books, we still felt the need for a few new logographic words that players could learn to recognize and translate. The new set of symbols for our return to Cantha had several goals. We wanted to maintain some of the shape language created in Guild Wars: Factions®, as well as focus on words that would get a lot of use on props in our expansion. We created some symbols for various NPCs that would appear, as well as things you would expect to see on neighborhood signs such as “open” or “welcome.”

The symbol for “open” specifically used inspiration from “air,” “light,” “up,” “night,” and “day.” If you look closely at those earlier words that are tied to concepts of open space and times of day, you begin to see that the smaller accent mark can be used to emphasize direction or sun placement. Thus, we went with something that means approximately “outside air.” And when you are opening a building what are you doing but letting in the outside world?

“Welcome” continues some of those same concepts. With this symbol we wanted to say not only is the location open, but that the opening is done with affection. To achieve this concept, we took the symbol for “open” and combined it with a recurring piece found in symbols such as “love,” “brother,” and “sister” to indicate the additional concept of treating another as family.

We will start sharing more of these new symbols before Guild Wars 2: End of Dragons arrives!

7b8b2EN_CanthanLanguageChart.png


View the full article

0 Comments | Add Your Comment

Mighty Teapot’s Ultimate Beginner’s Guide: Gear and Attributes

ArenaNet Partner MightyTeapot is back with another guide to help you navigate the fascinating and sometimes very deadly world of Tyria!

The latest Ultimate Beginner’s Guide will get you up to speed with an introduction to your characters’ gear and attributes, with detailed information on choosing armor, weapons, upgrades, and attribute combinations.

To view subtitles in English, German, French, or Spanish, turn on closed captioning and select a language in the video’s settings.


View the full article

0 Comments | Add Your Comment

Guild Wars 2 Weekly Events Schedule: July 20 – July 25

Our summer plans for Guild Wars 2 are packed with events! Check our official site on Mondays for a schedule of the week’s activities, and follow our Twitter, Instagram, and Facebook accounts for news and updates. Don’t forget—you’ll always find important information, Guild Wars 2 discussions, and community resources on our official forums.

If you’re new to Guild Wars 2 or returning to Tyria after a long journey, browse the #GW2Info hashtag on Twitter for helpful tips! Have you gathered knowledge and wisdom on your adventures? Use the hashtag to share your own tips with the community.

Return to the Living World Story: “The Head of the Snake”

afbe4The-Head-of-the-Snake-590x332.jpg

Catch up on the story of the dragon cycle and prepare for Guild Wars 2: End of Dragons. Log in to Guild Wars 2 between July 20 and July 26 to unlock this week’s spotlight episode for free! Check your in-game mailbox for a letter containing a story unlock token. You’ll find the unlocked episode under “Living World™ Season 3” in your story journal tab, which is located in the Hero panel. If you’ve already unlocked the episodes previously, you don’t need the tokens—you’re good to go!

New achievements for the episode are available in the “Bonus Events” category of your achievements tab. You can complete these at any time to progress the Seasons of the Dragons meta-achievement and work toward earning legendary rewards.

In addition to the new meta-achievement, there are plenty of existing achievements to earn, rewards to collect, and secrets to find in the episode itself. Here are some handy links to our official wiki to get you started.

Caudecus Beetlestone’s depraved White Mantle faction continues its efforts to bring Kryta to heel. Her Royal Majesty, Queen Jennah of Kryta, requests your presence in Divinity’s Reach—help her root out corruption in the Ministry before crisis strikes.

Requirements: You can unlock the episode without upgrading your account, but to play it you’ll need to own Guild Wars 2: Heart of Thorns™
Locations: Divinity’s Reach, Lake Doric
Valuable Treasures: Jade Shard
Meta Achievement: “The Head of the Snake” Mastery
Daily Achievements: Daily Lake Doric

World vs. World Weeklong Bonus Event—July 23

Jump in to WvW from July 23 to July 30 for a week of bonuses! You’ll receive a 100% bonus to World Experience, a 25% bonus to reward-track progress, and a 50% bonus to magic find when playing in WvW.


View the full article

0 Comments | Add Your Comment

Get the Dangerously Delightful Devil-Rending Axe Skin

  • Devil-Rending Axe Skin

7320aMTX_DevilRendingAxe-590x332.png

An eerie aura and dangerous, dagger-sharp spikes give this axe some seriously sinister style.

Unlock “The Head of the Snake” for Free
Log in to Guild Wars 2 between July 20 and July 26 to unlock the Living World Season 3 episode “The Head of the Snake” for free. You’ll need to own Guild Wars 2: Heart of Thorns™ to access the content, but free Guild Wars 2 accounts can unlock it.

What’s in Stock

We’re refreshing our seasonal back item inventory later this week. Stop by the Gem Store and treat yourself!

Our popular Copper-Fed Salvage-o-Matic, Silver-Fed Salvage-o-Matic, and Runecrafter’s Salvage-o-Matic are 20% off for a limited time. Pick them up and you’ll never run out of convenient ways to destroy your stuff in search of valuables.

Returning This Week
30% Off—White Mantle Outfit, White Mantle Glider, and White Mantle Appearance Pack
20% Off—Roadrunner Raptor Skin, Shrine Guardian Jackal Skin, and Grand Lion Griffon Skin

Available Now in the Gem Store!

Log into Guild Wars 2 and press 'O' to access the Black Lion Trading Company for these great offers and more!


View the full article

0 Comments | Add Your Comment

Guild Wars 2: End of Dragons First Look Livestream

Save the date! Next week on July 27, join us for an in-depth look at our third expansion, Guild Wars 2: End of Dragons™.

We’ve got a tidal wave of fun content lined up, including expansion features, a new trailer, details on the story and setting, elite specialization beta information, interviews with the development team, and more.

The stream begins at 8:00 a.m. Pacific Time (UTC-7) with our preshow, hosted by ArenaNet Partner BirdOfChess.

Program

Twitch:

  • 8:00 a.m. Pacific Time (UTC-7)—Preshow

YouTube, Twitch, and Facebook:

  • 8:45 a.m. Pacific Time (UTC-7)—15- Minute Countdown
  • 9:00 a.m. Pacific Time (UTC-7)—Guild Wars 2: End of Dragons First Look Livestream

Stay tuned for the postshow following the livestream.

We’ll celebrate with giveaways during the stream. Subscribe to our events on Facebook, YouTube, and Twitch to be notified when the streams go live!


View the full article

0 Comments | Add Your Comment

Guild Wars 2 Weekly Events Schedule: July 12 – July 18

Our summer plans for Guild Wars 2 are packed with events! Check our official site on Mondays for a schedule of the week’s activities, and follow our Twitter, Instagram, and Facebook accounts for news and updates. Don’t forget—you’ll always find important information, Guild Wars 2 discussions, and community resources on our official forums.

If you’re new to Guild Wars 2 or returning to Tyria after a long journey, browse the #GW2Info hashtag on Twitter for helpful tips! Have you gathered knowledge and wisdom on your adventures? Use the hashtag to share your own tips with the community.

Return to the Living World Story: “A Crack in the Ice”

296dd2016-11_LWS3-EP3_Wallpaper-1920x108

Catch up on the story of the dragon cycle and prepare for Guild Wars 2: End of Dragons. Log in to Guild Wars 2 between July 13 and July 20 to unlock this week’s spotlight episode for free! Check your in-game mailbox for a letter containing a story unlock token. You’ll find the unlocked episode under “Living World™ Season 3” in your story journal tab, which is located in the Hero panel. If you’ve already unlocked the episodes previously, you don’t need the tokens—you’re good to go!

New achievements for the episode are available in the “Bonus Events” category of your achievements tab. You can complete these at any time to progress the Seasons of the Dragons meta-achievement and work toward earning legendary rewards.

In addition to the new meta-achievement, there are plenty of existing achievements to earn, rewards to collect, and secrets to find in the episode itself. Here are some handy links to our official wiki to get you started.

Aurene is a baby dragon, and she needs you to provide guidance, enrichment, and probably some smoked fish. But Taimi has some urgent news about the potential of Elder Dragon magic, so travel north to investigate Bitterfrost Frontier—very near Jormag’s territory—while your new friend takes a well-deserved nap.

Requirements: You can unlock the episode without upgrading your account, but to play it you’ll need to own Guild Wars 2: Heart of Thorns™
Locations: Auric Basin, Hoelbrak, Bitterfrost Frontier
Valuable Treasures: Fresh Winterberry
Meta Achievement: “A Crack in the Ice” Mastery
Daily Achievements: Daily Bitterfrost Frontier


View the full article

0 Comments | Add Your Comment

Haunt the Deep and Harrow the Heavens with the Archdemon Chest

  • Ghost of the Deep Spaulders Skin

885eaMTX_GhostsoftheDeepSpaulders-590x33

Shroud yourself in the ghostly glory of the deep sea with this eerie shoulder armor.

Archdemon Chest

2e176MTX_ArchDemonChest-590x260.png

Inside each chest, you’re guaranteed to find a redeemable Black Lion Statuette, a Fine Black Lion Dye Canister—Green, and two common items. You also have a chance to find something rarer in the fifth slot, including special items, glyphs, and skins from the Branded Weapon Collection and Lovestruck Weapon Collection.

Archdemon Wings Backpack and Glider Combo
Intimidate and intrigue with these sharp ember-imbued wings. Don’t forget—if you’re not feeling the orange glow, you can dye your new backpack and glider with hundreds of fashionable colors.

Arcane Battlestaff Skin
This sleek staff is mysterious in the hands of any caster and perfectly balanced for melee combat.

Unlock A Living World Season 3 Episode for Free

Prepare for Guild Wars 2: End of Dragons and catch up on Living World™ episodes you’ve missed! Log in to Guild Wars 2 before July 20 to unlock “A Crack in the Ice” for free. Learn more about Living World Return in this blog post.

What’s in Stock

Embrace the great outdoors! We’re refreshing our inventory of unlimited gathering tools later this week, so stop by the Gem Store before you explore.

Returning Today
25% Off—Braham’s Wolfblood Outfit, Braham’s Wolfblood Pauldrons, and Braham’s Bitterfrost Frontier Pack

Returning This Week
25% Off—Glacial Chair, Glacial Glider, Glacial Logging Tool, Glacial Mining Tool, Glacial Harvesting Tool, and Watchwork Mining Pick

Available Now in the Gem Store!

Log into Guild Wars 2 and press 'O' to access the Black Lion Trading Company for these great offers and more!


View the full article

0 Comments | Add Your Comment

The Twisted Marionette and Legendary Armory Are Here

a3c59GW2-Marionette-thumb-EN-590x332.jpg

Today’s update is live, with quality-of-life improvements for legendary owners and risks to life and limb for anyone brave enough to be in the stomping radius of a giant machine. Read the update notes on our forums for all the details.

Learn more about the Legendary Armory in this blog post. Don’t forget to check your gear before you head to Eye of the North to challenge the Twisted Marionette!


View the full article

0 Comments | Add Your Comment

Inside ArenaNet: Live Game Outage Analysis

42748Repair_EN-590x332.png

Hi! I’m Robert Neckorcuk, Platform Team Lead for ArenaNet. My team runs a number of back-end network services for the Guild Wars® franchise: log-in servers, chat servers, websites, and more. We work very closely with two other teams, Game Operations and Release Management, as the back-end services group. We also work with Gameplay, Analytics, Customer Support, and so on, but our core focus is maintaining the live state of the games and their infrastructure. The Guild Wars franchise is recognized for its impressive online availability. Although we celebrate the success of our infrastructure, we have also encountered challenges. Today I’ll be sharing the details of the last major incident and the lessons learned for us to continue improving our commitment to you.

At 6:00 a.m. Pacific Time on Monday, May 11, 2020, I got a phone call. I can assure you, I didn’t have to look up this date—it’s still a pretty vivid memory for me. Our Game Operations team had gotten a page that a live Guild Wars 2 database had rolled back, players were getting inconsistent information from the game servers, and our internal tools were displaying a slew of other related alarms and errors.

Ah, kitten.

I’ve had less exciting Mondays. In fact, I prefer less exciting Mondays. Logging on to my work machine, I joined the folks investigating both what had happened and what we could do to return the game to its normal state. For a team and company whose goal is “always online,” our investigation led to our worst fear—we would have to shut down the game to restore the data to a good state. As this was my first time with an incident like this, it took a while to identify which approvals were needed. Just after 9:00 a.m., I submitted the changes to disable the game log-ins and shut down the servers for Guild Wars 2 in our European region.

Constantly Iterating

During this service interruption, Guild Wars 2 was down in our European data centers for just over 20 hours. Outside of a handful of minutes-long blips, the previous Guild Wars 2 shutdown happened August 23, 2016. While it is never fun to deal with live-impacting issues, we are able to use these incidents as learning opportunities to improve our infrastructure and processes. Our incident process follows that of many live-service companies, both in and outside of the games industry:

  • Identify the scope of the issue—is it happening in one server, one data center, or one build? See if we can narrow our investigation to a smaller set of services or regions.
  • Identify a path to get the service running as soon as possible—this is a trade-off between correctness and availability. If we get the service working by hacking something together, will we break something else?
  • Once the service is running again, write up the incident report—this report should detail what happened, what informed us of the issue, what steps we took, and who was impacted.
  • Later, during regular business hours, set up a meeting with key stakeholders to review the incident report and create tasks for any follow-up items—action items could be to improve alerting or detection, update documentation or automated processes, or build a new tool.

It’s not only during incidents that we look at our processes and procedures and make changes. In 2017, ArenaNet made the decision to migrate from an on-site data center to a cloud-based one, utilizing the AWS infrastructure (with zero downtime!). In 2020, after adjusting to remote work, we completed a migration of our analytics-logging pipeline from a hardware-based provider into a cloud provider.

The reason for all these cloud migrations is the offer of flexibility—we have the option to constantly change and modify our infrastructure quickly and seamlessly. In early 2020, we kicked off a holistic, internal AWS upgrade, looking at the costs and options for every server we were running, from our development commerce servers to our Live PvP scheduling servers. AWS is constantly providing new services and new hardware types, and we hadn’t performed a full audit since 2017. This planning would see us making changes to improve the player experience in the live game, matching our development environments to the live game, and improving several back-end tools. We also updated a few servers that had never changed‐or even been restarted—since our 2017 migration!

When tackling a large project, one of the most effective strategies is to break it down into smaller, more manageable chunks…like slicing up a big pizza! We wanted to go over every AWS instance, and fortunately already had several “buckets” of different instance types—game servers, database servers, commerce servers, etc. Our plan was to go through each of these one at a time, make the changes, then move on to the next group. For each set of changes, we would start by migrating our internal development servers. We’d write a runbook to document the process and verify the behavior, then migrate our staging and live environments following the runbook steps.

For any product and any company, the ability to iterate is absolutely key. Build stuff in your own backyard, bash it to pieces, break it as often and as strangely as you possibly can. Once you’ve fixed everything internally, it should be pretty resilient when it hits the public view. For the Platform and Game Operations teams, this “practice the way we play” process is yet another tool in our belt that aids our service quality. All that being said, the development and live-game servers are vastly different in terms of cost and scale, so as much as we wish everything behaved the same, they are quite unique.

Some (greatly simplified!) background on our databases: we have two servers, a primary and a secondary. When we write to the database, a message with “do this change” is sent to the primary instance. The primary then sends a message to the secondary (or mirror) with “do this change,” and then it applies the change to itself. If “bad things” happen to the primary, the secondary instance will automatically take over and report as the primary. This means there’s no downtime or loss of data, and the new primary will queue messages to send to the “new secondary” once it recovers from its “bad things.” To upgrade instances, we manually disconnect the primary and secondary databases, upgrade the secondary, and reconnect them.

One of the important data points we can track is how long it takes the secondary to receive all the queued messages from the primary and return to data parity. Combined with other load and health metrics, this can inform us whether the server is healthy or not—and I personally find graphs quite fascinating. We then swap the primary and secondary, rinse and repeat, and now we’ve got even more pretty graphs, along with a fully upgraded environment.
On the dev environment, everything ran smoothly. Our staging environment provided us another chance to run through the process end-to-end, and that also ran smoothly. We started our live-game upgrade with the European-region servers on Wednesday, May 6, 2020.

Where the Bad Things Are

cac8eOutage_EN-590x318.png

On May 6, the actual upgrade went exactly to plan. We severed the connection, upgraded the secondary, and reestablished a connection. The queued messages copied over, we swapped primary and secondary, and repeated those actions once more. The message copy was a little slower than our practice runs in the dev environment, but there was a much larger queue given the traffic volume of the live game.

Thursday, May 7, we got a “disk space almost full” alarm. We had recently made an additional backup of the data, which took up some space, but best practices dictated we expand the disk volume regardless. Storage is inexpensive, and with AWS, a few mouse clicks could double the size of our log and backup drive. A member of the Game Operations team clicked his mouse a few times, yet the expanded storage size did not magically appear. On Friday we submitted a support ticket to AWS, who recommended a restart to correct the issue. As this server was currently the primary, we would have to swap with the secondary before we could restart.

It’s very easy to look back on things and recognize the obvious mistakes, but at the time we didn’t know what we didn’t know. Friday afternoon we attempted to swap the primary and secondary so that we could perform a quick restart. At the time, however, the message queue from the primary to the secondary was not empty, and it was still copying slowly. Some calculations told us the hard drive would not completely fill up over the weekend, so we could return Monday morning, dive into the queued messages, swap the servers, and perform the restart.

We certainly returned on Monday…

While we recognized that the queued messages were sending slowly (think “turtle speed”), we failed to take into account the speed with which the queue was growing (think “jetpack turtle speed”!). Recall the somewhat accurate statement that if “bad things” happen to the primary database, the secondary will automatically take over as the new primary. As it so happens, one of the “bad things” that causes an automatic failover is this message queue growing too large.

At 2:41 a.m. on Monday, May 11, the “bad things” threshold was crossed, and the databases swapped primary and mirror. As the database update messages were all in a queue on the other server, when the databases failed over, suddenly players experienced time travel (and not the good kind) as all their progress since Friday evening was wiped away. Three painful hours later, at 5:40 a.m., our Game Operations team was called as the player reports increased. I was called at 6:00 a.m., and the game was shut down by 9:00 a.m.

As soon as the game was down, one of our first steps was to restart the primary server, and as predicted, the expanded disk volume appeared! Good news, if nothing else. On the newly expanded disk, we found the most recent automatic backup file and logs containing all the queued messages, up to 2:41 a.m.

It is excellent practice to save backups. Storage is cheap, having the option to recover data is guaranteed peace of mind, and it is very easy to schedule automated backups. Actually restoring data using that backup, however…that’s a different story, mainly because we don’t do it very often. On the one hand, there was no load on the databases, so we could tear everything down and start over if needed. On the other hand, the game was down, and we were feeling immense pressure to restore the data quickly and accurately to get the game running once again.

(Side note: I’m going to focus on the main story thread, but even at this point there were a series of side investigations around the greater impact on the Trading Post, Gem Store, new account creation, and more. A lot happened during this incident, with input from many different teams. I feel like I could write an entire book of mini stories about this one event!)

Honestly, the restore process was pretty boring, and we just followed a best-practices guide. A few button clicks in the management utility, and the server started crunching away. Before we could start, we had to copy the backup file and the log files to the second server to make sure we were restoring the same data on both. The server would then set the state of the databases to everything contained in the backup file. This process took several hours. Then it would apply all the logged messages so eventually both databases would be accurate to 2:41 a.m. This also took a long time. The first server finished at 1:37 a.m., the second at 4:30 a.m.

(Side note 2: Outside of the actual work you do, the people with whom you do work are probably the most important thing about choosing a company or a team. There were a handful of us, online since 5:30 or 6:00 a.m., taking only catnaps and otherwise working through the process into the early hours of the next day. Some server babysitting definitely happened! The ability for everyone in the war room to take the situation seriously yet manage themselves and have a bit of playfulness to get us all through the event cannot be understated. This team is amazing at what they do and how they go about doing it.)

The Guild Wars 2 infrastructure is also pretty amazing. It’s able to update with no downtime and contains many knobs and levers to turn and pull that can affect different features or content in the game for exploit prevention, crash prevention, or even changing the difficulty of dynamic events without needing a build! On May 12, we utilized another feature in the game not seen since 2016—the ability to launch live without launching live. The game would be fully online, letting developers in, but not public players.

At 4:55 a.m., we “turned on” the game, and a whole host of internal developers, QA analysts, and a handful of EU partners logged in to the game. I was hugely impressed that, within minutes, players had already identified that the game was being live tested, and their progress was restored to Monday morning. I suppose that happens when we make our APIs available!

After a smoke test of the live environment that checked the database health and the message queues, we opened up the servers to the public at 5:37 a.m., over 20 hours after shutting down. Everyone on the team was grateful to be able to help everyone get back into a world they love, myself included. I was also glad to get another few hours of sleep.

Whew.

After a few hours of rest, we all logged back in to work to try to understand what exactly was the root cause of the failure. That afternoon, an alarm triggered in our EU region—the database message-queue volume was too high.

Kitten. Again.

We had correctly identified and added a few alerting mechanisms for these newly discovered failure points. For the next two days, we manually managed the database mirroring and connections between the primary and secondary. We talked to database administrators, network operators, and our technical account managers from AWS. We configured every setting we could relating to message queues, volumes, storage, and logs. After days of reading documentation, gathering knowledge from experts, and attempting changes, we finally had our breakthrough on Friday.

Drum roll please!

The root cause was the drivers.

Drivers are what allow software operating systems to communicate with connected devices—if you have peripherals like a gaming mouse or a drawing tablet, you’ve probably downloaded specific drivers for your stuff.
In this instance, the server’s operating system did not have the latest communication method to interface with the hard drive. Not great on a database where reading and writing data to disk is 99% of its entire reason for existence… (The other 1% is the “wow” factor. Wow, you’ve got a database? Cool.)

When we upgraded the servers, we were moving from AWS’s 4th generation to their 5th generation servers, and with that came a difference in how the servers interact with connected devices (I don’t fully comprehend how the system works, but AWS has a cool name for the underlying technology: Nitro!). Our driver version was ahead of the basic version provided by AWS, so we did not expect to need additional updates. Plus, in our dev environments, this was not an issue at all! But we also didn’t have anywhere near the load there as we have on the live game. Despite it being another Friday and recognizing the consequences of an unsuccessful driver update, we decided to move forward.
As before, we severed the connection between the databases, ran the driver updates, and then reconnected.

The queue drained nigh instantly.

The data write speed measured 100,000 kilobits per second, up from the 700 Kbps we had been seeing previously.

I smiled. I laughed (somewhat uncontrollably).

We repeated the process on the other server. Again, the queue drained in a few blinks of an eye.

I slept that weekend. And I’ve had so many good Monday mornings since then.

Looking Backward and Forward

So, that’s what actually happened, what we missed, and why the incident occurred. Next is the part that I personally love about my job: What lessons did we learn? How did we use it to grow and improve? What friends did we meet along the way?

Well, first and foremost, this was a solid reminder for us on how different the dev and live environments are. The dev environment is great for practicing, recording specific performance metrics, and more. However, for some changes, key tools like load testing and stress testing across multiple machines are still necessary.

Second, it was a reminder to always check assumptions and take a step back to look at the macro view if something is not behaving as expected. We had done our due diligence in the days before the outage, but we focused on the wrong part of the problem. Checking in with someone else would have provided us a different perspective and a need to defend the irregularity we were seeing to determine whether it could be an issue.

The most impactful change for our databases was to increase alerting on key database metrics, not just system metrics like CPU or hard drive space. For our live operations, we added a number of alerts into a third-party tool to improve our response time for future issues. And for general operations, we’ve improved the record-keeping of our AWS infrastructure, now tracking more than just the instance type. Our reports now include instance types, generation, drivers, and storage types. We built a common package to install on all new servers that includes specific driver versions. Any future migration plans will update this common package, ensuring that we don’t repeat this issue again.

We have completed the migration for all the remaining database instances and more, providing higher performance for improved service. In the last fourteen months, we’ve recorded an uptime of 99.98%, with only five minor service interruptions impacting user log-ins.

Our continued efforts are always targeted at providing you with the best experience and usability of our services. We love to celebrate the design architected by those before us and the tools and processes we utilize to retain our world-class uptime, and we are pleased to bring our current availability streak back over the one-year mark. We recognize that we may not achieve perfection, but we will certainly strive for it with every future procedure and deployment. As we look forward to the exciting new features and projects the Guild Wars 2 gameplay and design teams are bringing to you, we are constantly working behind the scenes to make sure that you can always log in and enjoy.

Wow, so, I know I can write a lot of code; apparently, I can also write a pretty long blog post. I do hope you enjoyed reading. For some, this post may be a long time in the making, but I am actually very pleased to be able to share stories like this with you. We’d love to read your thoughts and feedback about this type of post, so we’ve set up a discussion thread on our official forums!

See you online in Tyria,
Robert


View the full article

0 Comments | Add Your Comment