Eng | Senthil

Your frontend, backend, and database — now in one Cloudflare Worker

In September 2024, we introduced beta support for hosting, storing, and serving static assets for free on Cloudflare Workers — something that was previously only possible on Cloudflare Pages. Being able to host these assets — your client-side JavaScript, HTML, CSS, fonts, and images — was a critical missing piece for developers looking to build a full-stack application within a single Worker. Today we’re announcing ten big improvements to building apps on Cloudflare. All together, these new additions allow you to build and host projects ranging from simple static sites to full-stack applications, all on Cloudflare Workers: ...

Cloudflare acquires Outerbase to expand database and agent developer experience capabilities

I’m thrilled to share that Cloudflare has acquired Outerbase. This is such an amazing opportunity for us, and I want to explain how we got here, what we’ve built so far, and why we are so excited about becoming part of the Cloudflare team. Databases are key to building almost any production application: you need to persist state for your users (or agents), be able to query it from a number of different clients, and you want it to be fast. But databases aren’t always easy to use: designing a good schema, writing performant queries, creating indexes, and optimizing your access patterns tends to require a lot of experience. Add that to exposing your data through easy-to-grok APIs that make the ‘right’ way to do things obvious, a great developer experience (from dashboard to CLI), and well… there’s a lot of work involved. ...

Cloudflare Workflows is now GA: production-ready durable execution

Betas are useful for feedback and iteration, but at the end of the day, not everyone is willing to be a guinea pig or can tolerate the occasional sharp edge that comes along with beta software. Sometimes you need that big, shiny “Generally Available” label (or blog post), and now it’s Workflows’ turn. Workflows, our serverless durable execution engine that allows you to build long-running, multi-step applications (some call them “step functions”) on Workers, is now GA. ...

Introducing AutoRAG: fully managed Retrieval-Augmented Generation on Cloudflare

Today we’re excited to announce AutoRAG in open beta, a fully managed Retrieval-Augmented Generation (RAG) pipeline powered by Cloudflare, designed to simplify how developers integrate context-aware AI into their applications. RAG is a method that improves the accuracy of AI responses by retrieving information from your own data, and providing it to the large language model (LLM) to generate more grounded responses. Building a RAG pipeline is a patchwork of moving parts. You have to stitch together multiple tools and services — your data storage, a vector database, an embedding model, LLMs, and custom indexing, retrieval, and generation logic — all just to get started. Maintaining it is even harder. As your data changes, you have to manually reindex and regenerate embeddings to keep the system relevant and performant. What should be a simple “ask a question, get a smart answer” experience becomes a brittle pipeline of glue code, fragile integrations, and constant upkeep. ...

Piecing together the Agent puzzle: MCP, authentication & authorization, and Durable Objects free tier

It’s not a secret that at Cloudflare we are bullish on the future of agents. We’re excited about a future where AI can not only co-pilot alongside us, but where we can actually start to delegate entire tasks to AI. While it hasn’t been too long since we first announced our Agents SDK to make it easier for developers to build agents, building towards an agentic future requires continuous delivery towards this goal. Today, we’re making several announcements to help accelerate agentic development, including: ...

Meta’s Llama 4 is now available on Workers AI

As one of Meta’s launch partners, we are excited to make Meta’s latest and most powerful model, Llama 4, available on the Cloudflare Workers AI platform starting today. Check out the Workers AI Developer Docs to begin using Llama 4 now. What’s new in Llama 4? </a> </div> <p>Llama 4 is an industry-leading release that pushes forward the frontiers of open-source generative Artificial Intelligence (AI) models. Llama 4 relies on a novel design that combines a <a href="#what-is-a-mixture-of-experts-model"><u>Mixture of Experts</u></a> architecture with an early-fusion backbone that allows it to be natively multimodal.</p><p>The Llama 4 “herd” is made up of two models: Llama 4 Scout (109B total parameters, 17B active parameters) with 16 experts, and Llama 4 Maverick (400B total parameters, 17B active parameters) with 128 experts. The Llama Scout model is available on Workers AI today.</p><p>Llama 4 Scout has a context window of up to 10 million (10,000,000) tokens, which makes it one of the first open-source models to support a window of that size. A larger context window makes it possible to hold longer conversations, deliver more personalized responses, and support better <a href="https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/"><u>Retrieval Augmented Generation</u></a> (RAG). For example, users can take advantage of that increase to summarize multiple documents or reason over large codebases. At launch, Workers AI is supporting a context window of 131,000 tokens to start and we’ll be working to increase this in the future.</p><p>Llama 4 does not compromise parameter depth for speed. Despite having 109 billion total parameters, the Mixture of Experts (MoE) architecture can intelligently use only a fraction of those parameters during active inference. This delivers a faster response that is made smarter by the 109B parameter size.</p> <div> <h3>What is a Mixture of Experts model?</h3> <a href="#what-is-a-mixture-of-experts-model"> </a> </div> <p>A Mixture of Experts (MoE) model is a type of <a href="https://arxiv.org/abs/2209.01667"><u>Sparse Transformer</u></a> model that is composed of individual specialized neural networks called “experts”. MoE models also have a “router” component that manages input tokens and which experts they get sent to. These specialized experts work together to provide deeper results and faster inference times, increasing both model quality and performance.</p> <figure> <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nQnnpYyTW5pLVPofbW6YD/3f9e79c13a419220cda20e7cae43c578/image2.png" /> </figure><p>For an illustrative example, let’s say there’s an expert that’s really good at generating code while another expert is really good at creative writing. When a request comes in to write a <a href="https://en.wikipedia.org/wiki/Fibonacci_sequence"><u>Fibonacci</u></a> algorithm in Haskell, the router sends the input tokens to the coding expert. This means that the remaining experts might remain unactivated, so the model only needs to use the smaller, specialized neural network to solve the problem.</p><p>In the case of Llama 4 Scout, this means the model is only using one expert (17B parameters) instead of the full 109B total parameters of the model. In reality, the model probably needs to use multiple experts to handle a request, but the point still stands: an MoE model architecture is incredibly efficient for the breadth of problems it can handle and the speed at which it can handle it.</p><p>MoE also makes it more efficient to train models. We recommend reading <a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/"><u>Meta’s blog post</u></a> on how they trained the Llama 4 models. While more efficient to train, hosting an MoE model for inference can sometimes be more challenging. You need to load the full model weights (over 200 GB) into GPU memory. Supporting a larger context window also requires keeping more memory available in a Key Value cache.</p><p>Thankfully, Workers AI solves this by offering Llama 4 Scout as a serverless model, meaning that you don’t have to worry about things like infrastructure, hardware, memory, etc. — we do all of that for you, so you are only one API request away from interacting with Llama 4. </p> <div> <h3>What is early-fusion?</h3> <a href="#what-is-early-fusion"> </a> </div> <p>One challenge in building AI-powered applications is the need to grab multiple different models, like a Large Language Model (LLM) and a visual model, to deliver a complete experience for the user. Llama 4 solves that problem by being natively multimodal, meaning the model can understand both text and images.</p><p>You might recall that <a href="https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/"><u>Llama 3.2 11b</u></a> was also a vision model, but Llama 3.2 actually used separate parameters for vision and text. This means that when you sent an image request to the model, it only used the vision parameters to understand the image.</p><p>With Llama 4, all the parameters natively understand both text and images. This allowed Meta to train the model parameters with large amounts of unlabeled text, image, and video data together. For the user, this means that you don’t have to chain together multiple models like a vision model and an LLM for a multimodal experience — you can do it all with Llama 4.</p> <div> <h3>Try it out now!</h3> <a href="#try-it-out-now"> </a> </div> <p>We are excited to partner with Meta as a launch partner to make it effortless for developers to use Llama 4 in Cloudflare Workers AI. The release brings an efficient, multimodal, highly-capable and open-source model to anyone who wants to build AI-powered applications.</p><p>Cloudflare’s Developer Platform makes it possible to build complete applications that run alongside our Llama 4 inference. You can rely on our compute, storage, and agent layer running seamlessly with the inference from models like Llama 4. To learn more, head over to our <a href="https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct"><u>developer docs model page</u></a> for more information on using Llama 4 on Workers AI, including pricing, additional terms, and acceptable use policies.</p><p>Want to try it out without an account? Visit our <a href="https://playground.ai.cloudflare.com/"><u>AI playground </u></a>or get started with building your AI experiences with Llama 4 and Workers AI.</p>

Welcome to Developer Week 2025

We’re kicking off Cloudflare’s 2025 Developer Week — our innovation week dedicated to announcements for developers. It’s an exciting time to be a developer. In fact, as a developer, the past two years might have felt a bit like every week is Developer Week. Starting with the release of ChatGPT, it has felt like each day has brought a new, disruptive announcement, whether it’s new models, hardware, agents, or other tools. From late 2024 and in just the first few months of 2025, we’ve seen the DeepSeek model challenge assumptions about what it takes to train a new state-of-the-art model, MCP introduce a new standard for how LLMs interface with the world, and OpenAI’s o4 model Ghiblify the world. ...

EP157: How to Learn Backend Development?

WorkOS Radar: Smarter protection with device fingerprinting (Sponsored)WorkOS Radar leverages advanced device fingerprinting to protect your platform from fraudulent activity, including fake signups, throwaway emails, and brute-force attacks. With WorkOS Radar, you can: Identify and challenge suspicious activity before it impacts your platform Prevent free-tier abuse and fraudulent access with precision detection Tailor threat detection and mitigation to fit your app’s exact needs ...