Dr. Tom’s Cyber Bits & Tips: When “the cloud” hiccups, the internet coughs

Early Monday (Oct. 20, 2025), Amazon Web Services (AWS) suffered a major disruption centered in US-EAST-1 (N. Virginia). Popular apps and sites—from Alexa and Amazon services to Fortnite, Snapchat, Canva, Airtable, Zapier, Coinbase, and banking portals—were degraded or down. AWS said it was “investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region,” later determining “the event was the result of DNS resolution issues for the regional DynamoDB service endpoints.” The core problem was mitigated within a few hours, with full normalization later that day. For stragglers, AWS advised flushing DNS caches.

What actually broke?

Short version: DNS lookups for DynamoDB in US-EAST-1 misbehaved. That sounds narrow, but DynamoDB underpins thousands of workloads. When the endpoint can’t be reliably resolved, anything leaning on it—auth checks, data pipelines, APIs, device backends—can stall. Because so many companies still treat US-EAST-1 as the “center of gravity” for global controls (identity and configuration), the blast radius extended beyond the region. There’s no indication of a cyberattack—this was classic DNS fragility at hyperscale.

What is DNS—and why a wobble breaks everything?

The Domain Name System (DNS) is the internet’s address book. It translates names like example.com into numeric IP addresses so your devices know where to connect. Your resolver (ISP or public) fetches answers from the domain’s authoritative DNS and caches them for a time-to-live (TTL). If DNS is slow, unreachable, misconfigured, or serving stale answers, traffic can’t find its destination—even if the servers are healthy. In modern, service-rich apps, a single DNS hiccup for an upstream dependency (auth, database, API) can stall entire workflows, produce inconsistent behavior (because caches differ), and trigger retry storms that slow recovery. Good hygiene: redundant authoritative DNS, sensible TTLs, health checks, and a status page that doesn’t share the same failure domain.

Why so many organizations moved to AWS, Azure, and Google Cloud Computing

Cloud won because it accelerates delivery while offloading undifferentiated heavy lifting. Teams deploy in minutes, scale elastically, and shift capex to usage-based spend. Rich managed services—databases, queues, storage, serverless, identity, analytics/AI—let builders assemble solutions instead of racking and patching boxes, and global regions reduce latency while aiding compliance. Strong security primitives and governance tooling come baked in, and a deep talent/vendor ecosystem lowers friction. The trade-offs: concentration risk if you bet on one region, cost surprises without FinOps discipline, and lock-in as you lean into proprietary services. The outage is a reminder: cloud accelerates you, but resilience still has to be engineered.

Practical takeaways

Reduce single-region risk: Run active-active or pilot-light in at least two regions, and don’t anchor “global” control planes (auth, config, orchestration) to US-EAST-1 or any single region.
Decouple global services: Split identity, configuration, and entitlements across regions/accounts with independent failover paths; avoid hard regional dependencies for shared services.
Ops in support of both: Monitor DNS and regional dependencies, keep traffic-shift/runbook playbooks ready, demand multi-AZ/region HA from vendors, and host status comms outside the affected stack.

Tips for everyday users & small businesses

Keep offline 2FA codes (or a second authenticator), maintain multiple payment options, download critical docs before travel/meetings, and avoid rapid-fire retries during outages—waiting 10–15 minutes often helps as systems relieve pressure. If a service still feels “sticky” after recovery, flush DNS caches and try again.

Bottom line

This wasn’t a hack; it was a reminder that our digital lives depend on a few highly concentrated chokepoints. The fix isn’t abandoning cloud—it’s designing like failure is normal: distribute critical services, plan fallback paths, and practice the switchovers before you need them.

Feeling lost in the digital world? Dr. Tom is here to help!

Join Dr. Tom every week in his column, Dr. Tom’s Cyber Bits and Tips, for byte-sized advice on all things cyber and tech. Whether you’re concerned about online safety, curious about the latest cybercrime trends, or simply want to navigate the ever-evolving digital landscape, Dr. Tom has you covered.

From practical cybersecurity tips to insightful breakdowns of current threats, Dr. Tom’s column empowers you to stay informed and protect yourself online. So, dive in and get savvy with the web – with Dr. Tom as your guide!