Eng
Globe Safety
QUIC restarts, slow problems: udpgrm to the rescue
At Cloudflare, we do everything we can to avoid interruption to our services. We frequently deploy new versions of the code that delivers the services, so we need to be able to restart the server processes to upgrade them without missing a beat. In particular, performing graceful restarts (also known as "zero downtime") for UDP servers has proven to be surprisingly difficult. We've previously written about graceful restarts in the context of TCP, which is much easier to handle. We didn't have a strong reason to deal with UDP until recently — when protocols like HTTP3/QUIC became critical. This blog post introduces udpgrm, a lightweight daemon that helps us to upgrade UDP servers without dropping a single packet. ...
How Halo on Xbox Scaled to 10+ Million Players using the Saga Pattern
The 6 Core Competencies of Mature DevSecOps Orgs (Sponsored)Understand the core competencies that define mature DevSecOps organizations. This whitepaper offers a clear framework to assess your organization's current capabilities, define where you want to be, and outline practical steps to advance in your journey. Evaluate and strengthen your DevSecOps practices with Datadog's maturity model. ...
Making complex text understandable: Minimally-lossy text simplification with Gemini
About 20 Pounds
Enhancing the Python ecosystem with type checking and free threading
Meta and Quantsight have improved key libraries in the Python Ecosystem. There is plenty more to do and we invite the community to help with our efforts. We’ll look at two key efforts in Python’s packaging ecosystem to make packages faster and easier to use: Unlock performance wins for developers through free-threaded Python – where we leverage Python 3.13’s support for concurrent programming (made possible by removing the Global Interpreter Lock (GIL)). Increase developer velocity in the IDE with improved type annotations. Enhancing typed Python in the Python scientific stack Type hints, introduced in Python 3.5 with PEP-484, allow developers to specify variable types, enhancing code understanding without affecting runtime behavior. Type-checkers validate these annotations, helping prevent bugs and improving IDE functions like autocomplete and jump-to-definition. Despite their benefits, adoption is inconsistent across the open source ecosystem, with varied approaches to specifying and maintaining type annotations. ...
How Canva Collects 25 Billion Events a Day
ACI.dev: The Only MCP Server Your AI Agents Need (Sponsored)ACI.dev’s Unified MCP Server provides every API your AI agents will need through just one MCP server and two functions. One connection unlocks 600+ integrations with built-in multi-tenant auth and natural-language permission scopes. Plug & Play – Framework-agnostic, works with any architecture. Secure – Tenant isolation for your agent’s end users. ...
Scaling with safety: Cloudflare's approach to global service health metrics and software releases
Has your browsing experience ever been disrupted by this error page? Sometimes Cloudflare returns "Error 500" when our servers cannot respond to your web request. This inability to respond could have several potential causes, including problems caused by a bug in one of the services that make up Cloudflare's software stack. We know that our testing platform will inevitably miss some software bugs, so we built guardrails to gradually and safely release new code before a feature reaches all users. Health Mediated Deployments (HMD) is Cloudflare’s data-driven solution to automating software updates across our global network. HMD works by querying Thanos, a system for storing and scaling Prometheus metrics. Prometheus collects detailed data about the performance of our services, and Thanos makes that data accessible across our distributed network. HMD uses these metrics to determine whether new code should continue to roll out, pause for further evaluation, or be automatically reverted to prevent widespread issues. ...
EP161: A Cheatsheet on REST API Design Best Practices
WorkOS + MCP: Authorization for AI Agents (Sponsored)Wide-open access to every tool on your MCP server is a major security risk. Unchecked access can quickly lead to serious incidents. Teams need a fast, easy way to lock down access with roles and permissions. WorkOS AuthKit makes it simple with RBAC — assign roles, enforce permissions, and control exactly who can access critical tools. ...