Insights on CrowdStrike Outage: Balancing Updates & Infrastructure
Key insights
- 💻 Former IT infrastructure engineer comments on the CrowdStrike outage
- 🔍 Shares insights from past experience working in cyber security infrastructure
- 👀 Downplays the severity of the event based on previous corporate experience
- 🔒 Notes over dependence on a single security provider and the issue as a process failure
- ⚙️ Highlights the complexity of the problem and dismisses unfounded blame on developers
- 🔄 Problem with release cycles and patching policies
- ⏳ Need for a better testing cycle for patches
- 🔧 Corporations need more power to control updates
- ⚠️ Cautious testing and deployment of updates are crucial to avoid system downtime
- ⚠️ Constant patching caused more problems than it solved
- 🔒 No rollback strategy led to extended troubleshooting
- 🤖 Automation became essential for addressing these challenges
- 🔍 Emphasis on thorough testing to prevent global outages
- 🚨 Concerns about bad default policies leading to outages
- 😅 Minor panic about a vulnerability, but not a big deal
- 🔑 Emphasizing the importance of following best practices for prevention
Q&A
What does the speaker say about the panic over a vulnerability and bad IT policies?
The speaker mentions a minor panic about a vulnerability, downplaying it as not a big deal. They attribute technical debt and poorly run IT departments in corporations to bad IT policies, emphasizing the importance of following best practices for prevention.
How did an infrastructure engineer approach updating and patching, and what did it highlight about organizations' handling of updates?
An infrastructure engineer automated the updating and patching of their entire environment but emphasized the importance of thorough testing to prevent global outages caused by bad default policies. The incident highlighted the need for organizations to be cautious with automatic updates and to prioritize testing cycles.
What issues arose from constant patching and lack of rollback strategy?
Constant patching caused more problems than it solved, leading to servers crashing due to known conflicts with hardware. The lack of a rollback strategy led to extended troubleshooting. Similar issues occurred with storage drivers, causing server disconnection. Automation became essential for addressing these challenges.
What does the speaker emphasize regarding security updates?
The speaker emphasizes the need for thorough investigations into security updates and advocates for cautious testing and deployment of updates to avoid potential system corruption. They highlight the significance of treating all updates with equal scrutiny and due diligence.
What are the problems related to release cycles and patching policies according to the speaker?
According to the speaker, the problems related to release cycles and patching policies include the need for a better testing cycle for patches, the necessity for corporations to have more power to control updates, the existence of known cycles for patch releases, and the requirement for immediate attention to emergency and zero-day exploits.
What insights does the former IT infrastructure engineer share about the CrowdStrike outage?
The former IT infrastructure engineer comments on the CrowdStrike outage, sharing insights from past experience working in cyber security infrastructure. They downplay the severity of the event based on previous corporate experience and note the overdependence on a single security provider. They emphasize the issue as a process failure and highlight the complexity of the problem, dismissing unfounded blame on developers.
- 00:00 A former IT infrastructure engineer shares insights on the CrowdStrike outage, emphasizing the common nature of such events in the corporate world and highlighting the complexity of the issue
- 01:41 The problem is with release cycles and patching policies. There needs to be a better testing cycle for patches, and more power to corporations to control updates. Patches are released in known cycles, but emergency and zero-day exploits also require immediate attention.
- 03:21 The speaker emphasizes the need for thorough investigations into security updates and advocates for cautious testing and deployment of updates to avoid potential system corruption. They highlight the significance of treating all updates with equal scrutiny and due diligence.
- 04:54 An architect's insistence on constant patching led to unforeseen issues, including causing servers to crash, leading to long troubleshooting sessions. Lack of rollback strategy and similar issues with storage drivers highlighted challenges of always being up to date. Automation became crucial for dealing with such problems.
- 06:34 An infrastructure engineer automated the updating and patching of their entire environment but emphasized the importance of thorough testing to prevent global outages caused by bad default policies. The incident highlighted the need for organizations to be cautious with automatic updates and to prioritize testing cycles.
- 08:14 There's a minor panic about a vulnerability, but it's not a big deal. Every corporation has technical debt. Bad IT policies are to blame, and following best practices is key for prevention.