Google says a set of crushed wheels used for moving its server racks triggered a chain reaction that may have disrupted Search, Gmail, and other services for some users.
A rack of servers at one of its data centers started overheating to the point where CPUs were automatically throttled, ultimately because a set of rack wheels couldn't bear the weight of Google's cloud kit.
Steve McGhee, a solutions architect at Google Cloud, says Google users "most likely" wouldn't have noticed errors caused by the rack's crushed wheels. But the chain of events resulted in enough CPU throttling to cause "user harm".
Fortunately, the incident wasn't as serious as one from June last year,caused by a failure in Google's automation software, which took down Gmail, YouTube, and customers' applications. That incident prompted a big apology to customers and a commitment to do better in future.
SEE: Cloud v. data center decision (ZDNet special report) | Download the report as a PDF (TechRepublic)
This time the company has decided to tell the story to illustrate the lengths it goes to to find the root cause of disruptions even when they don't noticeably impact users.
The latest event came to light when Google recently kicked off an investigation after a site reliability engineer noticed a spike in errors from machines on its edge network that cache content users frequently access. The machines were immediately taken offline to stop them impacting customers, allowing other machines to take up the slack.
Google engineers noticed some border gateway protocol (BGP) network errors but their characteristics suggested issues with the machines rather than the router. Further investigation turned up kernel messages in machines on the edge network that revealed CPU clock throttling.
The engineers found that failing systems were isolated to machines on a single rack. All of this investigation was happening remotely. Unable to explain why the rack was overheating enough to cause kernel errors, the engineers then requested Google's on-site data-center workers to physically check out the problem rack.
Soon after the data-center team reported back with a brief message and a picture of the rack's crushed wheels.
"Hello, we have inspected the rack. The casters on the rear wheels have failed and the machines are overheating as a consequence of being tilted," the team explained.
"The wheels (casters) supporting the rack had been crushed under the weight of the fully loaded rack," said McGhee.
"The rack then had physically tilted forward, disrupting the flow of liquid coolant and resulting in some CPUs heating up to the point of being throttled."
SEE: There's more to Google than Google: Dataset Search comes out of beta
It's not clear why the wheels were crushed but Google engineers feared it could be a more widespread problem and so they replaced all the racks that could be vulnerable to the same broken-wheel tilting issue.
The problem has caused Google to reconsider how it moves new racks into its data centers when they're being built.
Google's engineers discovered that casters on the rear wheels had failed, ultimately causing the machines to overheat.
The alarming tilt of a refrigeration unit also pointed to the underlying problem.
- Edge Computing Is A Red-Hot Tech Trend, Here's How To Invest in It - TheStreet - June 27th, 2020
- Docker servers infected with DDoS malware in extremely rare attacks - ZDNet - June 27th, 2020
- Nebulon emerges with software-defined storage, but from the cloud - ComputerWeekly.com - June 27th, 2020
- Ampere's New 128-Core Altra CPU Targets Intel, AMD In The Cloud - CRN: Technology news for channel partners and solution providers - June 27th, 2020
- Cloud IT Infrastructure Spending Continued to Grow in Q1 2020 While Spending on Non-Cloud Environments Saw Double-Digit Declines, According to IDC -... - June 27th, 2020
- Empowering Edge Cloud in the 5G & IoT Hyper-Connected Era - insideHPC - June 27th, 2020
- Ampere's 128-Core Processor Challenges Intel and AMD in a Cloud-Based Processor Showdown - News - All About Circuits - June 27th, 2020
- This Ransomware Campaign is Being Orchestrated from the Cloud - Computer Business Review - June 27th, 2020
- Ad industry spots money in the cloud | Industry Trends | IBC - IBC365 - June 27th, 2020
- Want to work at Microsoft? Dice.com looks at top jobs, skills the tech giant is looking for - OnMSFT - June 27th, 2020
- The Winston-Salem Symphony Announces Newly Elected Directors - Yes! Weekly - June 27th, 2020
- The Best VPNs for Businesses and Teams - PCMag.com - June 27th, 2020
- NexTech AR Solutions tie-up with Fastly cloud platform leads to video security breakthrough - Proactive Investors USA & Canada - June 27th, 2020
- Moving to the cloud: Migrating Blazegraph to Amazon Neptune - idk.dev - June 27th, 2020
- Cloud-Based Automation Is a Reality; Now What? - Radio World - Radio World - June 27th, 2020
- How Azure, AWS, Google handle data destruction in the cloud - TechTarget - June 27th, 2020
- AMD EPYC Processor Adoption Expands with New Supercomputing and High-Performance Cloud Computing System Wins - GlobeNewswire - June 27th, 2020
- What it Means To Be Software-Defined in Retail and How We Got Here - Retail Info Systems News - June 27th, 2020
- Cloud Storage Market 2020: Challenges, Growth, Types, Applications, Revenue, Insights, Growth Analysis, Competitive Landscape, Forecast- 2025 - Cole... - June 27th, 2020
- Why This Cloud ETF Can Keep up its Torrid Pace - ETF Trends - June 20th, 2020
- FaceApp Privacy: What You Need To Know About The Viral Russian App - Forbes - June 20th, 2020
- Cloud flash storage: SSD options from AWS, Azure and GCP - ComputerWeekly.com - June 20th, 2020
- Could the European Cloud Deliver Data Protection by Default? - CPO Magazine - June 20th, 2020
- Private Cloud Server Market How the Business Will Grow in 2026? - Cole of Duty - June 20th, 2020
- How to Profit from the Growing Divide in Tech Stocks. - Barron's - June 20th, 2020
- Cloud Computing in Education Sector Market Size, Growth, Analysis, Outlook by 2019 Trends, Opportunities and Forecast to 2025 - Medic Insider - June 20th, 2020
- Lenovo announces ThinkSystem SR860 V2 and SR850 V2 servers - Gadgets Now - June 20th, 2020
- Here are 8 career options that are least impacted by recessions - Business Standard - June 20th, 2020
- Everything your business needs to know about VPS - Tom's Guide - June 20th, 2020
- Global Cloud Based Collaboration Software Market : Industry Analysis and Forecast (2020-2027) - WorldsTrend - June 20th, 2020
- AWS claims to have blocked the largest DDoS attack in history - Cloud Pro - June 20th, 2020
- Cloud Office Migration Tool Market to Grow at Robust 17.8% CAGR to 2027 AvePoint, Binarytree, BitTitan, CodeTwo sp. z oo sp. k - Personal Injury... - June 20th, 2020
- Healthcare Cloud Computing Market to Witness Robust Expansion by 2025 - 3rd Watch News - June 20th, 2020
- Renowned French Cloud, storage, computing and AI solution providers Actualis, has today formally rebranded to become part of the Boston Group of... - June 20th, 2020
- Startups, here's what you should know about telcos - CTech - June 18th, 2020
- US buildings firm saves big on HDDs with Nasuni cloud NAS - ComputerWeekly.com - June 18th, 2020
- Uptycs Announces $30 Million in Funding to Deliver Next-Generation Security Analytics - AiThority - June 18th, 2020
- Keeping Your Cloud-based Office Safe And Secure - ISBuzz News - June 18th, 2020
- Pensando positioned as high-performance alternative to cloud provider 'lock-in' - SiliconANGLE - June 18th, 2020
- How M&E organisations can enable remote work with cloud-based video and animation production studios - ITProPortal - June 18th, 2020
- Ampere donates Arm64 server hardware to Debian to fortify the Arm ecosystem - Stockhouse - June 18th, 2020
- Kia Motors partners with Google Cloud to develop AI-based owner's manual app - Automotive World - June 18th, 2020
- Function-as-a-Service Market Recent Trends, Development, Growth and Forecast 2017-2025 - 3rd Watch News - June 18th, 2020
- How Lenovo, Cellnex, Nearby Computing Are Delivering on the Edge - eWeek - June 18th, 2020
- Dell Technologies Shifts AI Adoption into the Fast Lane with Simplified Advanced Computing - CXOToday.com - June 18th, 2020
- AWS mitigated a record-breaking 2.3 Tbps DDoS attack in February - SiliconANGLE - June 18th, 2020
- Deluxe Revolutionizes Distribution of Content to Theaters with Cloud-Based IP Delivery Solution through Deluxe One Platform - HostReview.com - June 18th, 2020
- Report: France and Germany team up on a cloud-computing ecosystem to take on Amazon, Microsoft and Google - FierceTelecom - June 6th, 2020
- Dive into the history of server hardware - TechTarget - June 6th, 2020
- COVID-19 Impact ON Private Cloud Server Market : What is the projected sales growth for 2026? - Cole of Duty - June 6th, 2020
- NextCloud gets bigger and better with Nextcloud Hub 19 - ZDNet - June 6th, 2020
- Micron Has the Potential to Rise 50% From Here - TheStreet - June 6th, 2020
- Kofax Partners with Microsoft to Enhance Cloud-Based Universal Print Solution with ControlSuite - Industry Analysts Inc - June 6th, 2020
- Dell and Intel answer the call for AI by building specific solutions for real problems - SiliconANGLE - June 6th, 2020
- Data Protection As A Service Market Projection of Each Major Segment over the Forecast Period - Cole of Duty - June 6th, 2020
- PAM as a Service: Its All a Matter of Trust - Security Boulevard - June 2nd, 2020
- How To Best Adapt Your Business When The World Is Moving Online - Forbes - June 2nd, 2020
- Cloud computing via satellite to drive 52 Exabytes of traffic by 2029: NSR - SatelliteProME.com - June 2nd, 2020
- Multinational Insurance Company Completes Upgrade of Majesco Policy for P&C from On-Premise to Majesco CloudInsurer to Bolster Growth Strategy -... - June 2nd, 2020
- COVID-19 Impact on Healthcare Cloud Computing Market Marked US$ 13 Bn in forecast Years 2025 - 3rd Watch News - June 2nd, 2020
- Cloud computing, future trends to be followed in the industry - Optocrypto - June 2nd, 2020
- You couldn't do this already? AWS adds size and bandwidth growth to FSx for Windows File Server - Blocks and Files - June 2nd, 2020
- Upstream Security Partners With Amazon Web Services to Enhance Automotive Cybersecurity - PRNewswire - June 2nd, 2020
- Improvements on the verify domain error in Office 365 - TechGenix - June 2nd, 2020
- Digital transformation held back by lack of skilled people - ComputerWeekly.com - June 2nd, 2020
- NTT Com internal cloud server hacked, information on 621 customers stolen - DatacenterDynamics - June 2nd, 2020
- Where is the edge in edge computing? And who gets to decide? - ZDNet - June 2nd, 2020
- Cloud-native architectures will define the vRAN future - 5Gradar - June 2nd, 2020
- Developers recall career 'aha' moments that have shaped their Docker experience - SiliconANGLE News - June 2nd, 2020
- HSBC platform uses AI to analyse trading data thousands of times faster - ComputerWeekly.com - June 2nd, 2020
- CloudBolt Releases Version 9.3 of Its Award-Winning Cloud Management Platform - Container Journal - May 31st, 2020
- Kaminario offers cut-price virtual SAN in the cloud - ComputerWeekly.com - May 31st, 2020
- 4 types of mobile security models and how they work - TechTarget - May 31st, 2020
- Increased cybersecurity for the transportation industry - Commercial Carrier Journal - May 31st, 2020
- Cloud-Based Firewalls Are Key to Protecting Employees While Working Remotely - Security Boulevard - May 31st, 2020
- Cloud storage 101: File, block and object storage in the cloud - ComputerWeekly.com - May 31st, 2020
- Cloud Transition During the COVID-19 Exposing the Enterprise Vulnerabilities - EnterpriseTalk - May 31st, 2020
- The Role of Artificial Intelligence in Ethical Hacking | EC-Council Official Blog - EC-Council Blog - May 31st, 2020
- Shelves are well-stocked with cloud-native tools, but simplicity remains a moving target - SiliconANGLE - May 31st, 2020
- Uncover and overcome cloud threat hunting obstacles - TechTarget - May 26th, 2020