Former XBox 360 Engineer Speaks Out on RROD Problems

 

xbox Well, the worst kept secret at Microsoft for the past couple of years has definitely been that the XBox 360 has extremely serious design flaws, leading to an insane number of system failures.  For a long time, the company line was that failures were within industry norms (which clearly wasn't true).  Eventually, the company had to face the reality of the situation and extend the system warranty for "Red Ring of Death" errors to three years, as the percentage of systems that were failing was far outside any acceptable level.

An interviewer at 8bitJoystick.com has conducted an interview with a former Microsoft employee (and part of the XBox and XBox 360 design team), who came clean about the system's widespread failures.

The below selections have been taken from the full interview available at 8bit Joystick:

Q: So what do you think the real failure rate of the Xbox 360 is? Some have estimated it as high as 30%. I got my Xbox in early 2007 and so far so good but what do you think the chance is that it's going to die on me one day.

A: It's around 30%, and all will probably fail early. This quarter they are expecting 1 M failures, most of those Xenons. Some of those are repeat failures. Life expectancy is all over the map because the design has very little margin for most of the important parameters. That means it's not a fault tolerant design. So a good unit may last a couple of years, while a bad unit can fail in hours. I have a launch unit and have not had a single problem with it. And it's used a lot. But I don't know anyone else with a 360 that hasn't broken, except you now. There's no way to tell when yours might die. But the cooler you can keep it, the longer it will probably last. So stand it up, keep it in free air, etc. :Note : Xenon was the code name for the first Xbox 360 mother board.

Q: Of all five videogame systems on the market now (PS3, PSP, PS2, DS, Wii and 360)only the Xbox 360 has had such major hardware failure problems. Microsoft being the only company based in the US making a videogame system. What part of Microsoft's way of doing things do you think caused this situation to happen.

A: First, MS has under resourced that product unit in all engineering areas since the very beginning. Especially in engineering support functions like test, quality, manufacturing, and supplier management. There just weren't enough people to do the job that needed to be done. The leadership in many of those areas was also lopsided in essential skills and experience. But I hear they are really trying to staff up now based on what has happened, and how cheap staff is compared to a couple of billion in cost of quality.

Second, MS was so focused on beating Sony this cycle that the 360 was rushed to market when all indications were that it had serious flaws. The design qual testing was insufficient and incomplete when the product was released to production. The manufacturing test equipment had major gaps in test coverage and wasn't reliable or repeatable. Manufacturing processes at eall levels of suppliers were immature and not in control. Initial end to end yields were in the mid 30%. Low yields always indicate serious design and manufacturing defects. Management chose to continue to ship anyways, and keep the lines running while trying to solve problems and bring the yields up. Whenever something failed and there was a question about whether the test result was false, they would remove that test, retest and ship, or see if the unit would boot a game and run briefly and then ship. 360 is too complex of a machine to get away with that.

In the end I think it was fear of failure, ambition to beat Sony, and the arrogance that they could figure anything out, that led to the decision to keep shipping. That management team had made some pretty bad decisions in the past and had never had to pay a proportional consequence. I'm sure they thought that somehow they would figure it out and everything would end up ok. Plus, they tend to make big decisions like that in terms of dollars. They would rationalize that if the first few million boxes had a high failure rate, a few 10's of millions of dollars would cover it. And contrasting that cost with a big lead on Sony, would pay it in a heartbeat. They weren't even thinking about Nintendo.

Compare that to Sony, who delayed their launch, even though they were behind, when their box wasn't ready.

Q: In your opinion what do you think the main cause of the Red Ring of Death failures have been?

A: RROD is caused by anything that fails in the "digital backbone" on the mother board. Also known as a core digital error. CPU, GPU, memory, etc. Bad parts, incompatible parts (timing problems) bad manufacturing process (like solder joints), misapplied heat sinks or thermal interface material, missing parts, broken parts, parts of the wrong value, missed test coverage. Any one or more, on any chip, or many other discrete components, would cause this. And many of the failures were obviously infant mortality, where they work when they leave the factory and fail early in use. The main design flaw was the excessive heat on the GPU warping the mother board around it. This would stress the solder joints on the GPU and any bad joints would then fail in early life.

There are also other significantly high failure rates in other areas, like the DVD.

Q: How much more reliable are the current generation of Xbox 360 than the previous designs? Original Xenon, Zypher and Falcon.

A: I've heard that the failure rates for the current design is sub 10%. Much much better, but still too high imoh. And those designs haven't seen much life yet, so no one knows if that failure rate will hold.

Q: How many times does an Xbox 360 unit have to be sent in and repaired before they will replace it with a completely new unit?

A: That's not how it works. You send in a broken box, you get back a working box (hopefully). So there is a rotating stock of the original units that get repaired and returned to service. Plus, they keep finding these cashes of launch units here and there and using them too. Didn't you hear during the holidays that bundles were found with units made in 06? Those were pulled back from the retail channel last spring when the new heatsink was done, and had the new heatsink placed on them and then put into the shipping flow like any other box.

Back to the rotating inventory of launch units. You risk getting one of those back until the last one is out of the system. I imagine the next big outrage will be when some of the folks who waited till Falcon to buy a console for reliability reasons, and has to send it in for service, gets a Xenon back! Even when all of the Xenons are gone, you will likely get a newer gen repaired one back rather than new. Unless the fail rate gets so low there are none available. I'm holding my breath...

Q: There has seemed to be an executive exodus from the top of the Xbox project. Seamus Blackley, Peter Moore, James Allard. Do you think that there something that has been causing the "fathers of Xbox" to want to move on?

A: Seamus left a long time ago, and I think there was some conflict so that it wasn't entirely voluntary. J Allard left to go do Zune (along with Greg Gibson), and is a big part of the team who owns the strategic vision of MS E&D under Robbie Bach. Peter was a surprise. He sure left in a hurry, and not the way top people usually go, which is usually with a longer notice. And right after the warranty extension announcement. I don't know if they are related, but it looks like they could be in some way. I noticed you didn't mention Ed Fries, who left in 04. I heard he landed at Sony, but can't verify. But I don't see the senior team wanting to move or moving. Very few people who leave do so voluntarily. Note: I did forget to mention Ed Fries.

Q: Do you see much of a long term future for Microsoft?s Entertainment & Devices Division? I saw that they just got a new campus and troubled projects rarely get new expensive buildings. Do you see that division ever turning a profit? So what do you think their overall hardware strategy is? Do you think that they will still be selling videogame systems and music players in five years?

A: Xbox's mission statement is to preserve the Windows monopoly and extend it into the living room, as a media extender for a Media Center PC, along with a host of other MS and other company's hardware devices that fit into a digital entertainment lifestyle. MS has the bucks to keep losing money on Xbox for a long time, maybe forever. They've already lost around 6 billion dollars. How are they ever going to make that back on Xbox? They can't. Maybe they don't think they have to. That amount might be just 1 or 2 quarters of profit for an integrated hw/sw portfolio, with windows, PC Hardware, Xbox, Zune, TV, Movies, ads, etc., all providing some revenue stream to MS. You should check out their jobs site sometime. You can learn a lot about what they are doing. And their patent applications. They have a team working on making PCs now. That voice activated thing they did for Ford? Where do you think you will see that next? MS devices and sw is my guess.

That new H&E campus says that MS is getting into consumer electronics in a big way, and you can bet they are working to refine a strategy of integrating their offerings into a digital lifestyle universe, with most everything covered that we could want to stay productive, connected and entertained. Not piece meal, like some companies seem to be approaching electronics. Look at Apple. They are doing great, keep rolling out innovative stuff, but what's their vision and strategy to implement? What's their roadmap and timeline? How does it all go together, work together? I can't tell from what they say or do. But I can see what MS is trying to do. They are just getting started I think. So yes, they will still be doing this in 5 years. But they really need to mature their business and change some blood in there. Hire some key people who have experience running large hardware companies who can put the right organization, process and infrastructure in place. If they don't, they may continue to have quality and operational issues that will really dampen their progress. And with all of the external challenges in consumer markets, even MS can't afford to be it's own enemy for too much longer.

Wow!  So, in short, failure rates are at least 30%.  Microsoft knew it had problems and hadn't tested their design nearly enough, but decided to ship systems to customers anyway.  Pretty much everyone involved with the XBox project has moved on to other things (or taken the fall, and left the company).  And, best of all, the XBox mission statement is to extend the Windows monopoly into the living room.  Sometimes you really have to wonder how some people (read "fanbois") can feel such affection for ANY company... much less one that behaves in this way.

0 comments: