What’s the worst bug/development challenge you encountered and how did you solve it?

Hello Community,

To all the bug-slaying champions out there, you know we admire your sniffing skills with these pesky :beetle: and reporting to us. We are committed to resolving every bug that comes our way. :tada:

We invite you to share your stories on the most annoying bug/development challenges you’ve ever encountered in your professional life and how you manage to solve them. It can be anything. Whether it was a small one or a colossal issue that all hell broke loose, we want to hear your story of your triumph with these :beetle: :exploding_head:

So share your bug-slaying escapade. It’s a fun way to connect and learn from our bug-battle together.

Thank you for your participation. :heart: :partying_face:

3 Likes

Interesting timing! Just last night I was catching up on Darknet Diaries and this episode recounts the failed 1962 NASA Mariner mission due to a bug and thus the initiation of software engineering (also includes an interview with Google Zero team member responsible for identifying NSO Pegasus bug): Maddie – Darknet Diaries

From the transcript:

In 1962, the first Mariner spaceship was launched, and it was headed for Venus. It didn’t have anyone onboard. It was controlled remotely, and onboard were just electronics, antennas, computers, jet fuel, and cameras. But only a few minutes after launching, things started to go wrong. The computer onboard that was in charge of controlling the ship was acting erratic, giving all kinds of wild commands for the ship to do. The folks at Mission Control tried to correct the computer gone wild, but they couldn’t do anything about it. Then they started to realize this rocket’s not gonna make it to Venus. It’s not even gonna make it out of the atmosphere, and it might even crash into Earth and hurt someone. So, the people at Mission Control decided there was no choice but to push the self-destruct button and blow up Mariner 1 over the Atlantic Ocean. That was the end of the Mariner 1 spacecraft, an $18.5 million-ship blown up. So, what happened?

Well, scientists and engineers spent days replaying the events and logs that they captured after launch. A piece of hardware failed, which caused an onboard computer to kick in and try to control the craft, but the way it was trying to control the craft wasn’t right. Something was wrong with that computer, so they examined the code that was put on that computer, and that’s when they saw the problem; a missing dash in the algorithm. A single missing dash. It’s not like the dash you’re thinking; it’s more like a bar that was supposed to be above the letter R, which stands for radius, and that meant it should have been a smoothed value for radius. Without this bar, it was taking the current value for R, and since this rocket was trying to recover from some bad hardware, the values for R were bouncing all over, so the output of the program was bouncing all over. It should have been taking an average reading for R, not the wildly fluctuating values. So, the computer was telling the rocket to fly all crazy and out of control.

The logic and algorithm that the scientists gave the programmer was correct, but whoever programmed that algorithm into the computer missed this little dash above the R, and because of that tiny little bug in the code, it resulted in the whole rocket being destroyed. When NASA makes a mistake like this, they try to find ways to prevent anything like this happening in the future. They realized they were implementing software on a lot of systems, but had no way to test the reliability of that software. This is when it became clear that software engineering should be a discipline, and shortly after that, it started getting developed and became a thing. This software bug didn’t just crash a spaceship, but it launched a whole new field of study and new principles for designing, developing, and testing computer software.

For myself, I find I do better when my IDE is actually connected to the remote code. Changing only the local files make the troubleshooting just that much longer… :sweat_smile:

6 Likes

It all started three months after I started learning software engineering, in the evening at home after school, I couldn’t stop and so I was already embarking on another project and I ended up with a seemingly innocent starts of web application, a small PHP web application.

As part of this project, I had coded a function that was supposed to make an API call to fetch data and display it on the webpage. Yet, every time I ran the function, it spit out some random values. I thought I was developing a website, but it seemed more like I had accidentally invented a dysfunctional random number generator :thinking:

I debugged, dug into the code, scrutinized every line. I even gave names to the variables, hoping to invoke the programming gods. Yet the bug stuck around like an overly attached pet. My function seemed to have developed a penchant for chaos, breaking free from my control, and causing mayhem in my orderly universe.

One night, after hours of staring at my screen, I fell asleep in front of the computer. No dreams about cars, or even fancy video games. Instead, I dreamt of PHP scripts, variables, and API calls. And somewhere in the midst of the dreamy codescape, a revelation came to me. I awoke with the same adrenaline rush you get when you finally place that elusive jigsaw piece. Could the issue lie not within my PHP code, but in the API itself? :bulb:

With newfound determination, I scrutinized the API documentation, dissected it, broke it down. Turns out, the culprit was hiding right there, in the API’s pagination settings. I had been naively fetching data from the wrong page number all along, hence the seemingly random values.

Once I fixed the API call, my function fell in line like a well-behaved soldier, fetching and displaying data exactly as intended. The beast had been tamed, the chaos contained.

Now, in the grand scheme of bug-slaying, you might think this is barely a skirmish. But for me, this was my first victory against the buggy beasts that roam the wilderness of code. I always been ready since this day for what comes next. Because the world of software engineering is vast, and there are bigger beasts to slay :person_fencing:

6 Likes

A few years ago, I was running a hardware company. We were working on our first product, an IoT camera with 2 small screens to provide feedback to users (it looked like an owl). The product had basically an android phone inside to handle the H.264 compression/streaming + “smart functionalities” and a very small SOC to handle the eyes animation and the motion detection.
The small SOC (STM32) was waking up the Snapdragon to reduce power consumption, but the tricky part was updating the firmwares.
The android phone was processing the STM32 firmware update but the STM32 was powering the phone.
Of course we had a way to make sure the Snapdragon wouldn’t go to sleep while updating the STM32 and of course in some extreme edge cases it did go to sleep.
It bricked some devices since there was no way for users to flash the SMT32 without disassembling the product.
Fortunately we caught the bug … on a Monday morning since I had the brilliant idea to roll out the new version on a Friday.
It was a hot mess and I ended up sending new devices (I got the bricked one back and we flashed them in the office). Only 10 devices were impacted but it could have been much worse, we already delivered 10,000 products at that point.
Everybody knows you shouldn’t release on a Friday, even a minor update. We thought it would go smoothly, I chose to ignore the basic rules and the silicon gods punished us. Lesson learned.

4 Likes

Some years ago, I was working in a company that developed an AIDS patient management software. The software was deployed on Linux bare-metal servers at our office and shipped in hospitals all around the world.

:penguin: :hospital: :computer: :earth_americas:

My coworker and I job was to manage these servers remotely, perform updates, etc. We had only one root access and only one SSH private key to connect to them.

One day, my co-worker left the company and I had to change the only root access to all servers by updating the SSH key.

I created a new private key on my computer and deployed the new public key on all servers, job done!

I didn’t save the new key in a safe place right away, and I deleted it the next day by accident :open_mouth: I was out of all the servers :scream:

I still had some standard user access but without root rights, I wasn’t able to perform updates of the installed softwares :confused:

Fortunately, we had a working backup strategy (We used BackupPC) and I was able to restore the old SSH public key and I recovered root access with the old private key.

I re-created and deployed a new key :key: that I immediately saved in a safe place.

2 Likes

Thank you for those stories. Well we all know that those small :beetle:can bring havoc. With this in mind, our awesome passbolt team member @shelby has written an article on some blunders that developers can make with secure development.

So stay ahead in the game, enhance your skills and build a more secure application. Check out the article here: Secure development mistakes you might not know you’re making | passbolt Have you ever experienced these mistakes?

I have so many things to say in such a topic and the more I think about it, the more ideas I have :sweat_smile:.

I’m hesitating with:

  • procedural map generation that I can’t make it to work because I was too stubborn on ideas
  • dynamic generation of texture in Unity with multiple solution possibles but none worked
  • the smallest bug fix I ever had to provide that took me too much time to find out
  • creating a Super Nintendo game because I’m me after all and I like to lose my time apparently

But, I’ll go with the procedural map generation. If you want to know for the others, let me know, and I will speak my dev life.

It was a very interesting challenge/experience in my developer life that had a significant impact on how I see development. I was working on a personal project. Being overly influenced by the game Super Metroid, I wanted to build a procedurally generated map that sticks on a grid.

Basically, it should have to look like the following:

Brinstar
(source: https://static.wikia.nocookie.net/metroid/images/4/46/Brinstar.png/revision/latest?cb=20080607235256)

But I wanted to include my things in it. So, the first step was to do some analysis about what I required in this generation. I thought about it a lot, wrote the ideas on a textbook, gave them a try to see if it works.
It didn’t.
Did that cycle about thinking, writing, coding, trying, failing again and again. Sometimes, I lost hope and gave up with that for months before retrying.

About 5 or 6 years spent like that (no joke). There was always a small issue somewhere that brakes the thing in a manner I couldn’t understand. The solution wasn’t ok obviously but why? I inspected the code in every single character it could have, spent a lot of time debugging step by step, recoding from scratch, changing its architecture, etc.
Nothing worked excepted removing the grid constraint but it felt unsatisfying to me and even more like a failure actually.

During these years I met a person in particular at a Game Jam. He was already working on a project for a video game that where sold on itch.io as a beta version at that time (now it’s even on Steam and Nintendo Switch btw). We were not part of the same team at that Jam. At a moment I took a look from far on his screen… He was already on a fine-tuning stage while with my team we were just struggling and had almost nothing.

I thought he was a very good developer and that I had a LOT to learn yet (I knew, but I couldn’t imagine such a gap). We kept in touch, and we had a discussion one day about development and his video game. The tools he used were custom made with an architecture that blows my mind. The game editor he built was actually a PHP-served web page that runs JavaScript and the data was XML. That was very surprising, it was like a cocktail to blow everything up. And on top of it he said something like “I don’t show my code to people usually, it’s very dirty.”.

Hard to believe, this combination couldn’t be possible: technologies not tailored for video games + dirty code. He actually had his constraints like everybody and this solution suits him. In the end, he had “dirty” working code and I had “clean” non-working code…
What’s the best?

Thinking about that, I considered another approach. This time, I won’t focus on code architecture and it doesn’t matter if I found the code dirty. Let’s just try to make it work and see.

I achieved my goals in two evenings (4 to 5 hours at most)…

I gathered a lot with that experience. Using a “dirty” approach first worked better because I couldn’t know what architecture to implement before identifying the solution I need. Only after, I could think about making it cleaner. Indeed, I was solving too much problem at the same time and not in the proper order making it impossible for me.

(I will put, for fun, a GIF later on when I’m back home to show you the result of that procedural generation and explain what to see.)

P.S: This code is still dirty today :rofl:

4 Likes

I want to the other stories @Steph (and also see the GIF)!

Here we go!

I promised it:

genproc-optimize

It’s the resulting step-by-step generation of the hypothetic world map.
As you can see there’s a grid that contains the map. On that grid appears rooms in different colours. The first room (where you should start) has its own colour. Then other rooms appear with other colours. Each colour is attached to a set of room that is accessible if you obtain the right key (the dot of the same colour as the rooms) or is accessible from the beginning (the first purple ones).

For the room connections (yes, in the real world we called that “doors”) we have different colours as well. But to understand, a bit of context needs to recall. The idea was to generate a world map with physically connected rooms. But, from the point of view of the player, some rooms have to appear in between other rooms while travelling the map once a key/skill is obtained. So, what happens in the world map is that we will need some connections between rooms that are not attached physically together. So here are the colours description of the connections:

  • green: it’s a door (nothing else than just a door)
  • yellow: an available candidate to place a door and attach another room to during the process
  • red: an available candidate to place a “virtual” door that links 2 room that are not physically connected. The virtual connection remains while the right key hasn’t been obtained yet. Once obtained the intermediate rooms are accessible (or appears in the point of view of the player).
  • the dashed lines marks 2 virtually connected doors
2 Likes

Nice!
So it’s the map and part of the gameplay procedural generation.
Are you planning to make it open source?

1 Like

Good question, back then no, I was very shy to show my code. Nowadays yes I have no problem with that.

2 Likes