tristanj 9 years ago

Looks like it's a separate issue than the one found last year, where the FAA directed 787s must be powered off before 248 days of uptime to avoid an integer overflow bug.

http://arstechnica.com/information-technology/2015/05/boeing...

  • justinsaccount 9 years ago

    yeah,

    I'd say if they are saying you need to reboot every 22 days, the actual limit is at least 23.. Why cut things close?

      22*3600*24 = 1900800 (x1000 for milliseconds)
      23*3600*24 = 1987200
      24*3600*24 = 2073600
      25*3600*24 = 2160000
    

    and, as it turns out

      2^31 / 1000 = 2147483
    

    If they are storing a time value as milliseconds in an signed 32 bit int, that would roll over every ~24:20:30

    • qb45 9 years ago

      Haha, I think you nailed it. Windows 95 had exactly the same bug, except that the counter was unsigned so it was taking 48 days to crash. I only hope these machines aren't flying some modified W95 kernel :)

zanecodes 9 years ago

>A permanent software fix is anticipated in the second quarter of 2017

Who wants to bet that the permanent fix will be 'automatically reboot the computers if the plane has been powered on for more than a week and stationary on the ground for more than N minutes'

  • disposablezero 9 years ago

    Actually, it's likely given the publicly-traded pressure toward the short-term cheapest option. Any substatial fix would be subject to passing QA/QC and validation processes given that safety-critical systems tend to use the waterfall development model. But, unfortunately, the explosion of subsystem features and over-engineered solutions leads to a factorial multiverse of subtle gotchas and inability for humans to prove nonfunctional requirement properties. Formal verification, feature hesitance ftw.

    • WalterBright 9 years ago

      > publicly-traded pressure toward the short-term cheapest option

      Boeing plays very much the long term game. It took something like several years in development and 10 years in production for the 747 to make a profit, for example. It is simply impossible to design and manufacture airliners on quarterly results.

      > cheapest

      Boeing is very well aware that the most expensive option for them would be to acquire a reputation for cheap, unsafe, crummy airliners.

      Source: am former 757 flight controls design engineer, and happy Boeing shareholder for 35 years.

      • dingaling 9 years ago

        Boeing changed in 1997 when the MDC bean-counters took over. Look at their line-up for 2020: fourth iteration on the 737 with MoM shaping up to be a fifth, third iteration on the 747 if it's still alive, third iteration on the 777, 787 range split across the -9/-10 and the -8 which is basically a completely different aircraft due to the chaotic and rushed development.

        "No more Moonshots" is their current self-professed philosophy:

        http://www.seattletimes.com/business/mcnerney-no-more-lsquom...

        And that's not even addressing their creative methods of program accounting, or the fact that their suppliers now have to wait 120 days for payment.

        http://www.reuters.com/article/us-boeing-suppliers-idUSKCN0Z...

        None of those sound like the Boeing of old that "bet the farm" on the 747. Which, anyway, was only meant to be a short-term airframe until supersonic airliners took over...

  • trevyn 9 years ago

    There's a bug in your permanent fix. :)

    (What happens if the plane is never stationary on the ground for more than N minutes?)

    • maxander 9 years ago

      Then, for reasonable N, it's not getting regular maintenance and should eventually disable the engines (while landed) as a failsafe. :)

      Although, after the recent Mars lander fiasco, I wonder about the ability of our hardware to even reliably tell whether it's landed or not. >_>

  • detaro 9 years ago

    I'm kind of surprised it requires a software fix. With the amount of maintenance checks and checklist procedures in aviation, "reboot plane every 2 weeks" doesn't sound like a big issue. What's worrysome is that the manufacturer didn't know about it and didn't put it on the checklists.

farzadb82 9 years ago

In my day we didn't have no fandangled "reboots". All we did is give our equipment a little percussive maintenance to get them back into service!

raverbashing 9 years ago

It's amazing how hardware manufacturers still can't do software.

I know, crap happens, but gosh darn it.

And here's the thing: bureaucracy doesn't help because it prevents people from raising issues unless they're big enough.

Every monotonically increasing counter needs to be reviewed for overflow behaviour and time to overflow

  • grogenaut 9 years ago

    Show me a piece of software by "Software manufacturers" without a bug and I'll show you a piece of software with not enough users.

    Aka: https://www.google.com/search?q=linux+uptime+bug

    • raverbashing 9 years ago

      Good, now search windows uptime bugs, zune leap year bug.

      No, there aren't systems without bugs, but some companies make it worse. I know, I have "worked" for some of them

      • grogenaut 9 years ago

        I wasn't picking on linux, I was using it as an example of what people consider stable and pointing long term uptime bugs like the airliner is suffering from.

  • thelambentonion 9 years ago

    The software on these planes is written by software engineering companies. This is less of an issue of hardware manufacturers doing software wrong, and more of an issue with how monumentally complex, high assurance systems are currently implemented in software.

pasbesoin 9 years ago

So, this applies to an entire jetliner?!

"Have you tried turning it off and back on again?"

I don't know whether I'm unsettled, or strangely comforted by the consistency of the world.

disposablezero 9 years ago

Add to the list of Embraer EMB-505 Phenom 300, most turboprops.

vacuumator 9 years ago

A 22 day memory leak...

I hope that's the earliest possible occurence, and not just the typical average observation.

Wouldn't want to see an edge case, where the system throws an OOM, at 21 days, due to a process running fast because the CPUs are kept cooler than usual, or something weird like that.

exabrial 9 years ago

Another bug: the built in USB charging ports in the back of the seats fried my galaxy s7 data connection! After my flight, the USB functionality and the quick charge function of my phone completely stopped working!

doggydogs94 9 years ago

For computers big and small, the goto solution is Reboot.

  • gomijacogeo 9 years ago

    Until all the MRAM and other persistent technologies hit. Then get ready for a generation of college graduates telling us old-timers that 'reboots' are unnecessary and loading immediately the previous system state is the new mason jar of systems goodness.