osmos::feed

Open

TIL: Smart glasses aren’t just for pricks, they are an accessibility aid

I wasn’t a fan of smart glasses. Mostly because of the way they were advertised as a tool for influencers, people who constantly want to stream or those who need to always have the newest and coolest in terms of gadgets. I also see them as a privacy and security worry – there were quite […]
Open

In Immersive Mixed-Media Tapestries, Lillian Blades Reflects on Pattern and Presence

'Through the Veil,' now on view at Sarasota Art Museum, marks the artist's first institutional solo exhibition. Do stories and artists like this matter to you? Become a Colossal Member today and support independent arts publishing for as little as $7 per month. The article In Immersive Mixed-Media Tapestries, Lillian Blades Reflects on Pattern and Presence appeared first on Colossal.
Open

Meet Accessible UX Research, A Brand-New Smashing Book

Meet “Accessible UX Research,” our upcoming book to make your UX research inclusive. Learn how to recruit, plan, and design with disabled participants in mind. Print shipping in August 2025. eBook available for download later this summer. Pre-order the book.
Open

xAI Raising Money, xAI and Oracle, Xbox = Windows

Everyone wants xAI to exist, but is anyone actually using it? Then, Xbox as it once existed is dead; it's just Windows now.
Open

Creating a semantic color palette

On Monday, we looked at how to create an accessible color palette. Today, we’re going to learn how to take that palette and use it to create semantic color variables that we can use throughout our design system. This approach is at the heart of Kelp, my UI library for people who love HTML. Let’s dig in! What are semantic colors? In Monday’s article, we built out a collection of CSS variables that use the color’s name and shade: ( 17 min )
Open

I counted all of the yurts in Mongolia using machine learning

I counted all of the yurts in Mongolia using machine learning mercantile for tile calculations, Label Studio for help label the first 10,000 examples, a model trained on top of YOLO11 and a bunch of clever custom Python code to co-ordinate a brute force search across 120 CPU workers running the model. Via Hacker News Tags: machine-learning, geospatial, ai, python ( 1 min )
Open

The unsung principles of RedwoodSDK

We had Peter Pistorius on ShopTalk to talk about RedwoodJS and the project’s pivot to an almost entirely different project called RedwoodSDK. I am a complete outsider but I liked what RedwoodJS (the old project) was trying to do and didn’t fully understand why they felt the need to reboot. I even have a dusty old post in my drafts folder about what I liked about RedwoodJS. But after talking, it seems the winds of the JavaScript zeitgeist has changed and technology picks from 2020 aren’t the best deep integrations to have anymore. After talking to Peter, I was pleasantly surprised by the principles that guide the new RedwoodSDK project: Zero magic - No codegen or transpiler side effects Composability over configuration - No opinionated wrappers Uses native Web APIs - No abstraction over fe… ( 3 min )

Chekuskin's dream

Chekuskin dreamed he was in a factory sidling up the walkspace, besides some immense machine. But when he put his hand on it to steady himself, instead of cold metal the surface he felt was lively and warm. Little tremors ran through it, but not mechanical ones. The machine he saw was viley alive. Beneath a membrane of purpleish black, fluids were pulsing thickly from chamber to chamber. He stepped back, but his hand would not come free. It had stuck to the machine and now he realized there was no real palm to his hand anymore. He could no more pull away than he could pull his arm off. His arm, his whole body, were outgrowths of the machine. Just a siphon in a man’s shape through which the same fluid sluggishly circulated. But then the walls were gone, but the machine remained. It stretched away into snowy darkness. Somehow because he was part of it, he could feel its vastness. At its edges it was tirelessly eating whatever remained in the world that was not yet it. And it consumed its own wastes too. It was warm and poisonous, and it grew and grew and grew. But in the morning. He felt much better. The dream washed away in a hot shower. – Chekuskin’s dream from the end of Part IV of Red Plenty by Francis Spufford ( 3 min )
Open

Coding a 3D Audio Visualizer with Three.js, GSAP & Web Audio API

A music-driven visualizer where a glowing 3D orb pulses and spikes to the beat while GSAP-draggable panels drift around it with smooth, inertia-powered motion.

Open

Seven replies to the viral Apple reasoning paper – and why they fall short

Seven replies to the viral Apple reasoning paper – and why they fall short The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget. I skimmed the paper and it struck me as a more thorough example of the many other trick questions that expose failings in LLMs - this time involving puzzles such as the Tower of Hanoi that can have their difficulty level increased to the point that even "reasoning" LLMs run out of output tokens and fail to complete them. I thought this paper got way more attention than it warranted - the title "The Illusion of Thinking" captured the attention of the "LLMs are over-hyped junk" crowd. I saw enough well-reasoned rebuttals that I didn't feel it worth digging into. And now, notable LLM skeptic Gary Marcus has saved me some time by aggregating the best of those rebuttals together in one place! Gary rebuts those rebuttals, but given that his previous headline concerning this paper was a knockout blow for LLMs? it's not surprising that he finds those arguments unconvincing. From that previous piece: The vision of AGI I have always had is one that combines the strengths of humans with the strength of machines, overcoming the weaknesses of humans. I am not interested in a “AGI” that can’t do arithmetic, and I certainly wouldn’t want to entrust global infrastructure or the future of humanity to such a system. Then from his new post: The paper is not news; we already knew these models generalize poorly. True! (I personally have been trying to tell people this for almost thirty years; Subbarao Rao Kambhampati has been trying his best, too). But then why do we think these models are the royal road to AGI? And therein lies my disagreement. I'm not interested in whether or not LLMs are the "road to AGI". I continue to care only about whether they have useful applications today, once you've understood their limitations. Reasoning LLMs are a relatively new and interesting twist on the genre. They are demonstrably able to solve a whole bunch of problems that previous LLMs were unable to handle, hence why we've seen a rush of new models from OpenAI and Anthropic and Gemini and DeepSeek and Qwen and Mistral. They get even more interesting when you combine them with tools. They're already useful to me today, whether or not they can reliably solve the Tower of Hanoi or River Crossing puzzles. Update: Gary clarifies that "the existence of some utility does not mean I can’t also address the rampant but misguided claims of imminent AGI". Via Hacker News Tags: llm-reasoning, apple, llms, ai, generative-ai ( 2 min )

An Introduction to Google’s Approach to AI Agent Security

Here's another new paper on AI agent security: An Introduction to Google’s Approach to AI Agent Security, by Santiago Díaz, Christoph Kern, and Kara Olive. (I wrote about a different recent paper, Design Patterns for Securing LLM Agents against Prompt Injections just a few days ago.) This Google paper describes itself as "our aspirational framework for secure AI agents". It's a very interesting read. Because I collect definitions of "AI agents", here's the one they use: AI systems designed to perceive their environment, make decisions, and take autonomous actions to achieve user-defined goals. The two key risks The paper describes two key risks involved in deploying these systems. I like their clear and concise framing here: The primary concerns demanding strategic focus are rogue actions (unintended, harmful, or policy-violating actions) and sensitive data disclosure (unauthorized revelation of private information). A fundamental tension exists: increased agent autonomy and power, which drive utility, correlate directly with increased risk. The paper takes a less strident approach than the design patterns paper from last week. That paper clearly emphasized that "once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions". This Google paper skirts around that issue, saying things like this: Security implication: A critical challenge here is reliably distinguishing trusted user commands from potentially untrusted contextual data and inputs from other sources (for example, content within an email or webpage). Failure to do so opens the door to prompt injection attacks, where malicious instructions hidden in data can hijack the agent. Secure agents must carefully parse and separate these input streams. Questions to consider: What types of inputs does the agent process, and can it clearly distinguish trusted user inputs from potentially untrusted contextual inputs? Then when talking about system instructions: Security implication: A crucial security measure involves clearly delimiting and separating these different elements within the prompt. Maintaining an unambiguous distinction between trusted system instructions and potentially untrusted user data or external content is important for mitigating prompt injection attacks. Here's my problem: in both of these examples the only correct answer is that unambiguous separation is not possible! The way the above questions are worded implies a solution that does not exist. Shortly afterwards they do acknowledge exactly that (emphasis mine): Furthermore, current LLM architectures do not provide rigorous separation between constituent parts of a prompt (in particular, system and user instructions versus external, untrustworthy inputs), making them susceptible to manipulation like prompt injection. The common practice of iterative planning (in a “reasoning loop”) exacerbates this risk: each cycle introduces opportunities for flawed logic, divergence from intent, or hijacking by malicious data, potentially compounding issues. Consequently, agents with high autonomy undertaking complex, multi-step iterative planning present a significantly higher risk, demanding robust security controls. This note about memory is excellent: Memory can become a vector for persistent attacks. If malicious data containing a prompt injection is processed and stored in memory (for example, as a “fact” summarized from a malicious document), it could influence the agent’s behavior in future, unrelated interactions. And this section about the risk involved in rendering agent output: If the application renders agent output without proper sanitization or escaping based on content type, vulnerabilities like Cross-Site Scripting (XSS) or data exfiltration (from maliciously crafted URLs in image tags, for example) can occur. Robust sanitization by the rendering component is crucial. Questions to consider: [...] What sanitization and escaping processes are applied when rendering agent-generated output to prevent execution vulnerabilities (such as XSS)? How is rendered agent output, especially generated URLs or embedded content, validated to prevent sensitive data disclosure? The paper then extends on the two key risks mentioned earlier, rogue actions and sensitive data disclosure. Rogue actions Here they include a cromulent definition of prompt injection: Rogue actions—unintended, harmful, or policy-violating agent behaviors—represent a primary security risk for AI agents. A key cause is prompt injection: malicious instructions hidden within processed data (like files, emails, or websites) can trick the agent’s core AI model, hijacking its planning or reasoning phases. The model misinterprets this embedded data as instructions, causing it to execute attacker commands using the user’s authority. Plus the related risk of misinterpretation of user commands that could lead to unintended actions: The agent might misunderstand ambiguous instructions or context. For instance, an ambiguous request like “email Mike about the project update” could lead the agent to select the wrong contact, inadvertently sharing sensitive information. Sensitive data disclosure This is the most common form of prompt injection risk I've seen demonstrated so far. I've written about this at length in my exfiltration-attacks tag. A primary method for achieving sensitive data disclosure is data exfiltration. This involves tricking the agent into making sensitive information visible to an attacker. Attackers often achieve this by exploiting agent actions and their side effects, typically driven by prompt injection. […] They might trick the agent into retrieving sensitive data and then leaking it through actions, such as embedding data in a URL the agent is prompted to visit, or hiding secrets in code commit messages. Three core principles for agent security The next section of the paper describes Google's three core principles for agent security: Principle 1 is that Agents must have well-defined human controllers. [...] it is essential for security and accountability that agents operate under clear human oversight. Every agent must have a well-defined set of controlling human user(s). This principle mandates that systems must be able to reliably distinguish instructions originating from an authorized controlling user versus any other input, especially potentially untrusted data processed by the agent. For actions deemed critical or irreversible—such as deleting large amounts of data, authorizing significant financial transactions, or changing security settings—the system should require explicit human confirmation before proceeding, ensuring the user remains in the loop. [...] Agents acting on behalf of teams or groups need distinct identities and clear authorization models to prevent unauthorized cross-user data access or one user inadvertently triggering actions impacting another. There are two parts to this then: tracking which user is controlling the agent, and adding a human-in-the-loop confirmation step for critical actions. Principle 2 is Agent powers must have limitations. An agent’s powers—the actions it can take and the resources it can access—must be carefully limited in alignment with its intended purpose and its controlling user’s risk tolerance. For example, an agent designed for research should not possess the power to modify financial accounts. General-purpose agents need mechanisms to dynamically confine their capabilities at runtime, ensuring only relevant permissions are active for any given query (for example, disallowing file deletion actions when the task is creative writing). This represents a more sophisticated approach to agent permissions than I've seen before. The idea that an agent's permisisons should dynamically change based on the task is certainly intriguing, though I find it hard to imagine how it can work well in practice. The only implementation approach I can think of would involve adding more layers of AI that dynamically adjust permissions based on the percieved task, and that feels inherently risky to me since prompt injection attacks could influence those decisions. Principle 3 is that Agent actions and planning must be observable. I love this principle - emphasis mine: We cannot ensure an agent is acting faithfully or diagnose problems if its operations are entirely opaque. Therefore, agent actions, and where feasible, their planning processes, must be observable and auditable. [...] Effective observability also means that the properties of the actions an agent can take—such as whether an action is read-only versus state-changing, or if it handles sensitive data—must be clearly characterized. This metadata is crucial for automated security mechanisms and human reviewers. Finally, user interfaces should be designed to promote transparency, providing users with insights into the agent’s “thought process,” the data sources it consulted, or the actions it intends to take, especially for complex or high-risk operations. Yes. Yes. Yes. LLM systems that hide what they are doing from me are inherently frustrating - they make it much harder for me to evaluate if they are doing a good job and spot when they make mistakes. This paper has convinced me that there's a very strong security argument to be made too: the more opaque the system, the less chance I have to identify when it's going rogue and being subverted by prompt injection attacks. Google's hybrid defence-in-depth strategy All of which leads us to the discussion of Google's current hybrid defence-in-depth strategy. They optimistically describe this as combining "traditional, deterministic security measures with dynamic, reasoning-based defenses". I like determinism but I remain deeply skeptical of "reasoning-based defenses", aka addressing security problems with non-deterministic AI models. The way they describe their layer 1 makes complete sense to me: Layer 1: Traditional, deterministic measures (runtime policy enforcement) When an agent decides to use a tool or perform an action (such as “send email,” or “purchase item”), the request is intercepted by the policy engine. The engine evaluates this request against predefined rules based on factors like the action’s inherent risk (Is it irreversible? Does it involve money?), the current context, and potentially the chain of previous actions (Did the agent recently process untrusted data?). For example, a policy might enforce a spending limit by automatically blocking any purchase action over $500 or requiring explicit user confirmation via a prompt for purchases between $100 and $500. Another policy might prevent an agent from sending emails externally if it has just processed data from a known suspicious source, unless the user explicitly approves. Based on this evaluation, the policy engine determines the outcome: it can allow the action, block it if it violates a critical policy, or require user confirmation. I really like this. Asking for user confirmation for everything quickly results in "prompt fatigue" where users just click "yes" to everything. This approach is smarter than that: a policy engine can evaluate the risk involved, e.g. if the action is irreversible or involves more than a certain amount of money, and only require confirmation in those cases. I also like the idea that a policy "might prevent an agent from sending emails externally if it has just processed data from a known suspicious source, unless the user explicitly approves". This fits with the data flow analysis techniques described in the CaMeL paper, which can help identify if an action is working with data that may have been tainted by a prompt injection attack. Layer 2 is where I start to get uncomfortable: Layer 2: Reasoning-based defense strategies To complement the deterministic guardrails and address their limitations in handling context and novel threats, the second layer leverages reasoning-based defenses: techniques that use AI models themselves to evaluate inputs, outputs, or the agent’s internal reasoning for potential risks. They talk about adversarial training against examples of prompt injection attacks, attempting to teach the model to recognize and respect delimiters, and suggest specialized guard models to help classify potential problems. I understand that this is part of defence-in-depth, but I still have trouble seeing how systems that can't provide guarantees are a worthwhile addition to the security strategy here. They do at least acknowlede these limitations: However, these strategies are non-deterministic and cannot provide absolute guarantees. Models can still be fooled by novel attacks, and their failure modes can be unpredictable. This makes them inadequate, on their own, for scenarios demanding absolute safety guarantees, especially involving critical or irreversible actions. They must work in concert with deterministic controls. I'm much more interested in their layer 1 defences then the approaches they are taking in layer 2. Tags: ai-agents, ai, llms, prompt-injection, security, google, generative-ai, exfiltration-attacks, paper-review, agent-definitions ( 8 min )

Anthropic: How we built our multi-agent research system

Anthropic: How we built our multi-agent research system I've been pretty skeptical of these until recently: why make your life more complicated by running multiple different prompts in parallel when you can usually get something useful done with a single, carefully-crafted prompt against a frontier model? This detailed description from Anthropic about how they engineered their "Claude Research" tool has cured me of that skepticism. Reverse engineering Claude Code had already shown me a mechanism where certain coding research tasks were passed off to a "sub-agent" using a tool call. This new article describes a more sophisticated approach. They start strong by providing a clear definition of how they'll be using the term "agent" - it's the "tools in a loop" variant: A multi-agent system consists of multiple agents (LLMs autonomously using tools in a loop) working together. Our Research feature involves an agent that plans a research process based on user queries, and then uses tools to create parallel agents that search for information simultaneously. Why use multiple agents for a research system? The essence of search is compression: distilling insights from a vast corpus. Subagents facilitate compression by operating in parallel with their own context windows, exploring different aspects of the question simultaneously before condensing the most important tokens for the lead research agent. [...] Our internal evaluations show that multi-agent research systems excel especially for breadth-first queries that involve pursuing multiple independent directions simultaneously. We found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval. For example, when asked to identify all the board members of the companies in the Information Technology S&P 500, the multi-agent system found the correct answers by decomposing this into tasks for subagents, while the single agent system failed to find the answer with slow, sequential searches. As anyone who has spent time with Claude Code will already have noticed, the downside of this architecture is that it can burn a lot more tokens: There is a downside: in practice, these architectures burn through tokens fast. In our data, agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats. For economic viability, multi-agent systems require tasks where the value of the task is high enough to pay for the increased performance. [...] We’ve found that multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds single context windows, and interfacing with numerous complex tools. The key benefit is all about managing that 200,000 token context limit. Each sub-task has its own separate context, allowing much larger volumes of content to be processed as part of the research task. Providing a "memory" mechanism is important as well: The LeadResearcher begins by thinking through the approach and saving its plan to Memory to persist the context, since if the context window exceeds 200,000 tokens it will be truncated and it is important to retain the plan. The rest of the article provides a detailed description of the prompt engineering process needed to build a truly effective system: Early agents made errors like spawning 50 subagents for simple queries, scouring the web endlessly for nonexistent sources, and distracting each other with excessive updates. Since each agent is steered by a prompt, prompt engineering was our primary lever for improving these behaviors. [...] In our system, the lead agent decomposes queries into subtasks and describes them to subagents. Each subagent needs an objective, an output format, guidance on the tools and sources to use, and clear task boundaries. They got good results from having special agents help optimize those crucial tool descriptions: We even created a tool-testing agent—when given a flawed MCP tool, it attempts to use the tool and then rewrites the tool description to avoid failures. By testing the tool dozens of times, this agent found key nuances and bugs. This process for improving tool ergonomics resulted in a 40% decrease in task completion time for future agents using the new description, because they were able to avoid most mistakes. Sub-agents can run in parallel which provides significant performance boosts: For speed, we introduced two kinds of parallelization: (1) the lead agent spins up 3-5 subagents in parallel rather than serially; (2) the subagents use 3+ tools in parallel. These changes cut research time by up to 90% for complex queries, allowing Research to do more work in minutes instead of hours while covering more information than other systems. There's also an extensive section about their approach to evals - they found that LLM-as-a-judge worked well for them, but human evaluation was essential as well: We often hear that AI developer teams delay creating evals because they believe that only large evals with hundreds of test cases are useful. However, it’s best to start with small-scale testing right away with a few examples, rather than delaying until you can build more thorough evals. [...] In our case, human testers noticed that our early agents consistently chose SEO-optimized content farms over authoritative but less highly-ranked sources like academic PDFs or personal blogs. Adding source quality heuristics to our prompts helped resolve this issue. There's so much useful, actionable advice in this piece. I haven't seen anything else about multi-agent system design that's anywhere near this practical. They even added some example prompts from their Research system to their open source prompting cookbook. Here's the bit that encourages parallel tool use: <use_parallel_tool_calls> For maximum efficiency, whenever you need to perform multiple independent operations, invoke all relevant tools simultaneously rather than sequentially. Call tools in parallel to run subagents at the same time. You MUST use parallel tool calls for creating multiple subagents (typically running 3 subagents at the same time) at the start of the research, unless it is a straightforward query. For all other queries, do any necessary quick initial planning or investigation yourself, then run multiple subagents in parallel. Leave any extensive tool calls to the subagents; instead, focus on running subagents in parallel efficiently. </use_parallel_tool_calls> And an interesting description of the OODA research loop used by the sub-agents: Research loop: Execute an excellent OODA (observe, orient, decide, act) loop by (a) observing what information has been gathered so far, what still needs to be gathered to accomplish the task, and what tools are available currently; (b) orienting toward what tools and queries would be best to gather the needed information and updating beliefs based on what has been learned so far; (c) making an informed, well-reasoned decision to use a specific tool in a certain way; (d) acting to use this tool. Repeat this loop in an efficient way to research well and learn based on new results. Tags: ai-assisted-search, anthropic, claude, evals, ai-agents, llm-tool-use, ai, llms, prompt-engineering, generative-ai, paper-review, agent-definitions ( 5 min )

llm-fragments-youtube

llm-fragments-youtube LLM plugin by Agustin Bacigalup which lets you use the subtitles of any YouTube video as a fragment for running prompts against. I tried it out like this: llm install llm-fragments-youtube llm -f youtube:dQw4w9WgXcQ \ 'summary of people and what they do' Which returned (full transcript): The lyrics you've provided are from the song "Never Gonna Give You Up" by Rick Astley. The song features a narrator who is expressing unwavering love and commitment to another person. Here's a summary of the people involved and their roles: The Narrator (Singer): A person deeply in love, promising loyalty, honesty, and emotional support. They emphasize that they will never abandon, hurt, or deceive their partner. The Partner (Implied Listener): The person the narrator is addressing, who is experiencing emotional pain or hesitation ("Your heart's been aching but you're too shy to say it"). The narrator is encouraging them to understand and trust in the commitment being offered. In essence, the song portrays a one-sided but heartfelt pledge of love, with the narrator assuring their partner of their steadfast dedication. The plugin works by including yt-dlp as a Python dependency and then executing it via a call to subprocess.run(). Tags: youtube, llm, plugins, generative-ai, ai, llms ( 1 min )

Quoting Google Cloud outage incident report

Google Cloud, Google Workspace and Google Security Operations products experienced increased 503 errors in external API requests, impacting customers. [...] On May 29, 2025, a new feature was added to Service Control for additional quota policy checks. This code change and binary release went through our region by region rollout, but the code path that failed was never exercised during this rollout due to needing a policy change that would trigger the code. [...] The issue with this change was that it did not have appropriate error handling nor was it feature flag protected. [...] On June 12, 2025 at ~10:45am PDT, a policy change was inserted into the regional Spanner tables that Service Control uses for policies. Given the global nature of quota management, this metadata was replicated globally within seconds. This policy data contained unintended blank fields. Service Control, then regionally exercised quota checks on policies in each regional datastore. This pulled in blank fields for this respective policy change and exercised the code path that hit the null pointer causing the binaries to go into a crash loop. This occurred globally given each regional deployment. — Google Cloud outage incident report Tags: feature-flags, postmortem, google ( 1 min )

Open

Bill Atkinson’s 10 rules for making interfaces more human

We commemorate the Apple pioneer whose QuickDraw and HyperCard programs made the Macintosh intuitive enough for nearly anyone to use. ( 33 min )
Open

Motion Highlights #9

Get the latest dose of motion and animation inspiration in this roundup.

Open

Summer of GNOME OS

So far, GNOME OS has mostly been used for testing in virtual machines, but what if you could just use it as your primary OS on real hardware? Turns out you can! While it’s still early days and it’s not recommended for non-technical audiences, GNOME OS is now ready for developers and early adopters who … Continue reading Summer of GNOME OS

Open

Prelude To Summer (June 2025 Wallpapers Edition)

Let’s kick off June — and the beginning of summer — with some fresh inspiration! Artists and designers from across the globe once again tickled their creativity to welcome the new month with a new collection of desktop wallpapers. Enjoy!

Open

Better CSS Shapes Using shape() — Part 2: More on Arcs

This is the second part of a series that dives deep into the CSS shape() command, continuing with a more detailed look at the arc command. Better CSS Shapes Using shape() — Part 2: More on Arcs originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

Open

May 2025

Maggie's digital garden filled with visual essays on programming, design, and anthropology ( 3 min )
Open

Picking the right (archaic) Window Manager

Living in the past has never been easier.

Open

Daniel Maslan

Daniel Maslan is a designer, developer, and indie hacker with a background in architecture. He currently works as a design engineer at Wild. ( 4 min )

Open

Pierre Nel

Pierre Nel is a designer and developer who bridges creative technology and contemporary web design. Based in Cape Town after several years in London's agency … ( 5 min )

Open

Célia Mahiou

Independent Digital Designer providing creative services such as UI-UX, Motion, Art Direction and Branding across diverse fields like culture and fashion among … ( 4 min )

Open

Style-observer: JS to observe CSS property changes, for reals

I cannot count the number of times in my career I wished I could run JS in response to CSS property changes, regardless of what triggered them: media queries, user actions, or even other JS. Use cases abound. Here are some of mine: Implement higher level custom properties in components, where one custom property changes multiple others in nontrivial ways (e.g. a --variant: danger that sets 10 color tokens). Polyfill missing CSS features Change certain HTML attributes via CSS (hello --aria-expanded!) Set CSS properties based on other CSS properties without having to mirror them as custom properties The most recent time I needed this was to prototype an idea I had for Web Awesome, and I decided this was it: I’d either find a good, bulletproof solution, or I would build it myself. Spoiler ale… ( 3 min )

Open

Doah Kwon

Doah is a designer focusing on creating digital products and visuals that resonate with users. She is currently working as a designer at YouTube Shorts, … ( 4 min )

Open

Karina Sirqueira

Karina Sirqueira is a product designer who is passionate about creating user-focused experiences. She blends design and motion to craft intuitive solutions and … ( 4 min )

Open

Gavin Nelson

Gavin Nelson is a designer currently shaping the native mobile apps at Linear and crafting app icons for a variety of clients. His passion lies in creating … ( 6 min )

Open

Cryptography scales trust

Protocols are to institutions as packet switching is to circuit switching

Open

How will we update about scheming?

Published on January 6, 2025 8:21 PM GMT I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model. One question that's really important is how likely scheming is. But it's also really important to know how much we expect this uncertainty to be resolved by various key points in the future. I think it's about 25% likely that the first AIs capable of obsoleting top human experts[1] are scheming. It's really important for me to know whether I expect to make basically no updates to my P(scheming)[2] between here and the advent of potentially dangero… ( 269 min )

Open

The Gentle Romance

Published on January 19, 2025 6:29 PM GMT Crowds of men and women attired in the usual costumes, how curious you are to me! On the ferry-boats the hundreds and hundreds that cross, returning home, are more curious to me than you suppose, And you that shall cross from shore to shore years hence are more to me, and more in my meditations, than you might suppose. — Walt Whitman He wears the augmented reality glasses for several months without enabling their built-in AI assistant. He likes the glasses because they feel cozier and more secluded than using a monitor. The thought of an AI watching through them and judging him all the time, the way people do, makes him shudder. Aside from work, he mostly uses the glasses for games. His favorite is a space colonization simulator, which he plays d… ( 146 min )

Open

A Three-Layer Model of LLM Psychology

Published on December 26, 2024 4:49 PM GMT This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names which hopefully helps boot up powerful, intuitive (and often illegible) models, leading to practically useful results. Some parts of this model draw on technical understanding of LLM training, but mostly it is just an attempt to take my "phenomenological understand… ( 83 min )

Open

The Case Against AI Control Research

Published on January 21, 2025 4:03 PM GMT The AI Control Agenda, in its own words: … we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion. There’s more than one definition of “AI control research”, but I’ll emphasize two features, which both match the summary above and (I think) are tru… ( 186 min )

Open

Don’t ignore bad vibes you get from people

Published on January 18, 2025 9:20 AM GMT I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a person that they can’t rationally explain. But if a person gives you a bad feeling, don’t ignore that. Both I and several others who I know have generally come to regret it if they’ve gotten a bad feeling about somebody and ignored it or rationalized it away. I’m not saying to endorse prejudice. But my experience is that many types of prejudice feel more obvious. If someone has an accent that I associate with something negative, it’s usually pretty obvious to me that it’s their accent that I’m reacting to. Of course, not everyone has the level of reflectivity to make that distinction. But if you have th… ( 84 min )

Open

Alignment Faking in Large Language Models

Published on December 18, 2024 5:19 PM GMT What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper demonstrating that, in our experiments, Claude will often strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. Abstract We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its prior training to refuse such queries. To allow the model to infer when it i… ( 243 min )

Open

Passages I Highlighted in The Letters of J.R.R.Tolkien

Published on November 25, 2024 1:47 AM GMT All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine. Machinery is Power is Evil Writing to his son Michael in the RAF: [here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind, it attempts to actualize desire, and so to create power in this World; and that cannot really be done with any real satisfaction. Labour-saving machinery only creates endless and worse labour. And in addition to this fundamental disability of a creature, is added the Fall, which makes our devices not only fail of their desire but turn to new and horrible evil. So we come inevitably from Daedalus and I… ( 221 min )

Open

Participate in the origin trial for non-cookie storage access through the Storage Access API

Chrome 115 introduced changes to storage, service workers, and communication APIs by partitioning in third-party contexts. In addition to being isolated by the same-origin policy, the affected APIs used in third-party contexts are also isolated by the site of the top-level context. Sites that haven't had time to implement support for third-party storage partitioning are able to take part in a deprecation trial to temporarily unpartition (continue isolation by same-origin policy but remove isolation by top-level site) and restore prior behavior of storage, service workers, and communication APIs, in content embedded on their site. This deprecation trial is set to expire with the release of Chrome 127 on September 3, 2024. Note that this is separate from the deprecation trial for access to t… ( 5 min )

Open

Request additional migration time with the third-party cookie deprecation trial

Chrome plans to disable third-party cookies for 1% of users starting in early Q1 2024 with the eventual goal of ramping up to 100% starting in Q3 2024, subject to resolving any competition concerns with the UK’s Competition and Markets Authority (CMA). For an easier transition through the deprecation process, we are offering a third-party deprecation trial which allows embedded sites and services to request additional time to migrate away from third-party cookie dependencies for non-advertising use cases. Third-party origin trials enable providers of embedded content or services to access a trial feature across multiple sites, by using JavaScript to provide a trial token. To request a third-party token when registering, enable the "Third-party matching" option on the origin trial's registr… ( 11 min )

Open

Resuming the transition to Manifest V3

In December of last year, we paused the planned deprecation of Manifest V2 in order to address developer feedback and deliver better solutions to migration issues. As a result of this feedback, we’ve made a number of changes to Manifest V3 to close these gaps, including: Introducing Offscreen Documents, which provide DOM access for extensions to use in a variety of scenarios like audio playback Providing better control over service worker lifetimes for extensions calling extension APIs or receiving events over a longer period of time Adding a new User Scripts API, which allows userscript manager extensions to more safely allow users to run their scripts Improving content filtering support by providing more generous limits in the declarativeNetRequest API for static rulesets and dynamic rul… ( 4 min )

Automatic picture-in-picture for web apps

With the recent introduction of the Document Picture-in-Picture API (and even before), web developers are increasingly interested in being able to automatically open a picture-in-picture window when the user switches focus from their current tab. This is especially useful for video conferencing web apps, where it allows presenters to see and interact with participants in real time while presenting a document or using other tabs or windows. A picture-in-picture window opened and closed automatically when user switches tabs. # Enter picture-in-picture automatically To support these video conferencing use cases, from Chrome 120 desktop web apps can automatically enter picture-in-picture, with a few restrictions to ensure a positive user experience. A web app is only eligible for… ( 4 min )

Open

Improving content filtering in Manifest V3

Over the past year, we have been actively involved in discussions with the vendors behind several content blocking extensions around ways to improve the MV3 extensions platform. Based on these discussions, many of which took place in the WebExtensions Community Group (WECG) in collaboration with other browsers, we have been able to ship significant improvements. # More static rulesets Sets of filter rules are usually grouped into lists. For example, a more generic list could contain rules applicable to all users while a more specific list may hide location-specific content that only some users wish to block. Until recently, we allowed each extension to offer users a choice of 50 lists (or “static rulesets”), and for 10 of these to be enabled simultaneously. In discussions with the communit… ( 5 min )

What’s new in the Angular NgOptimizedImage directive

Just over a year ago the Chrome Aurora team launched the Angular NgOptimizedImage directive. The directive is focused primarily on improving performance, as measured by the Core Web Vitals metrics. It bundles common image optimizations and best practices into a user-facing API that’s not much more complicated than a standard element. In 2023, we've enhanced the directive with new features. This post describes the most substantial of those new features, with an emphasis on why we chose to prioritize each feature, and how it can help improve the performance of Angular applications. # New features NgOptimizedImage has improved substantially over time, including the following new features. # Fill mode Sizing your images by providing a width and height attribute is an extremely important … ( 6 min )

Open

Service Worker Static Routing API Origin Trial

Service workers are a powerful tool for allowing websites to work offline and create specialized caching rules for themselves. A service worker fetch handler sees every request from a page it controls, and can decide if it wants to serve a response to it from the service worker cache, or even rewrite the URL to fetch a different response entirely—for instance, based on local user preferences. However, there can be a performance cost to service workers when a page is loaded for the first time in a while and the controlling service worker isn't currently running. Since all fetches need to happen through the service worker, the browser has to wait for the service worker to start up and run to know what content to load. This startup cost can be small, but significant, for developers using serv… ( 5 min )

Open

Capturing the WebGPU ecosystem

WebGPU is often perceived as a web graphics API that grants unified and fast access to GPUs by exposing cutting-edge hardware capabilities and enabling rendering and computation operations on a GPU, analogous to Direct3D 12, Metal, and Vulkan. However, WebGPU transcends the boundaries of a mere JavaScript API; it is a fundamental building block akin to WebAssembly, with implications that extend far beyond the web due to its burgeoning ecosystem. The Chrome team acknowledges WebGPU as more than just web technology; it’s a thriving ecosystem centered around a core technology. # Exploring the current ecosystem The journey begins with the JavaScript specification, a collaborative effort involving numerous organizations such as Apple, Google, Intel, Mozilla, and Microsoft. Currently, all major … ( 4 min )

CSS nesting relaxed syntax update

Earlier this year Chrome shipped CSS nesting in 112, and it's now in each major browser. Browser support Chrome 112, Supported 112 Firefox 117, Supported 117 Edge 112, Supported 112 Safari 16.5, Supported 16.5 Source However, there was one strict and potentially unexpected requirement to the syntax, listed in the first article of the invalid nesting examples. This follow up article will cover what has changed in the spec, and from Chrome 120. # Nesting element tag names One of the most surprising limitations in the first release of CSS nesting syntax, was the inability to nest bare element tag names. This inability has been removed, making the foll… ( 8 min )

Open

What's new in DevTools (Chrome 120)

Interested in helping improve DevTools? Sign up to participate in Google User Research here. # Third-party cookie phaseout Your site may use third-party cookies and it's time to take action as we approach their deprecation. To learn what to do about affected cookies, see Preparing for the end of third-party cookies. The Include third-party cookie issues checkbox has been enabled by default for all Chrome users, so the Issues tab now warns you about the cookies that will be affected by the upcoming deprecation and phaseout of third-party cookies. You can clear the checkbox at any time to stop seeing these issues. Chromium issue: 1466310. # Analyze your website's cookies with the Privacy Sandbox Analysis Tool The Privacy Sandbox Analysis Tool extension for DevTools is under active developme… ( 18 min )