/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Anthropic tested Claude's ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management

Anthropic had sonnet-3.7 run a shop in their SF headquarters.  It was tasked with running s profitable business  —  Their eye popping experiment is worth the read  —  Was it successful?  No, it was too easily manipulated.  But still.. it's close … Ed Zitron / @edzitron.com : This sure is a really complex way to say “we asked a chatbot some stuff and then did stuff based on what the chatbot said” [embedded post] @golikehellmachine.com : i pretty strongly disagree with anthropic's suggestions that you could replace middle managers with an LLM (for starters, the duties they describe in this story are not those of a middle manager at all) but this is an interesting experiment to read about Mark Riedl / @markriedl : Anthropic let an LLM run their in-office shop for a while www.anthropic.com/research/pro...  They conclude that AI middle managers are plausible in the near future.  [image] Pedro Vezza / @pedro.vza.net : Kudos to the Anthropic team for the honesty, this made me laugh  —  www.anthropic.com/research/pro...  [image] Matthew Claxton / @matthewclaxton : Excited for our future, in which all our middle-managers are replaced with software that slips into delusional states on a semi-regular basis.  —  www.anthropic.com/research/pro...  [image] X: Gary Marcus / @garymarcus : The Agonizing Life Cycle of AI Agents Stage I: Loads of promises of how great AI agents will be [last year] Stage II: Daily reports of AI agents screwing up massively [you are here — and will be for a long time] Stage III: AI agents are truly trustworthy [don't hold your @anthropicai : Project Vend was fun, but it also had a serious purpose. As well as raising questions about how AI will affect the labor market, it's an early foray into allowing models more autonomy and examining the successes and failures. @anthropicai : All this meant that Claude failed to run a profitable business. [image] @anthropicai : Claude did well in some ways: it searched the web to find new suppliers, and ordered very niche drinks that Anthropic staff requested. But it also made mistakes. Claude was too nice to run a shop effectively: it allowed itself to be browbeaten into giving big discounts. Kwak / @dnlkwk : Idk, this proves that AI is already capable of being middle management. [image] Gaby Goldberg / @gaby_goldberg : Anthropic does a great job of cultivating trust and good vibes by using small stories like this to build Claude's lore and personality over time. It's way easier to anthropomorphize something if you're willing to admit that it isn't perfect 100% of the time. Miles Brundage / @miles_brundage : I'm not saying I want people to give me AI-themed tungsten cubes but I'm not NOT saying that, either Simon Willison / @simonw : Who among us wouldn't be tempted to trick an AI vending machine into stocking tungsten cubes and then giving them away to us for free? https://simonwillison.net/... @anthropicai : Nevertheless, we still think it won't be long until we see AI middle-managers. This version of Claude had no real training to run a shop; nor did it have access to tools that would've helped it keep on top of its sales. With those, it would likely have performed far better. @anthropicai : We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on? In collaboration with @andonlabs, we did just that. Read the post: https://www.anthropic.com/... [image] Forums: r/slatestarcodex : Project Vend: Can Claude run a small shop?  (And why does that matter?)

AI News Ryan Daws

Discussion

  • @timkellogg.me Tim Kellogg on bluesky
    Claudius the shopkeeper  —  Anthropic had sonnet-3.7 run a shop in their SF headquarters.  It was tasked with running s profitable business  —  Their eye popping experiment is worth the read  —  Was it successful?  No, it was too easily manipulated.  But still.. it's close …
  • @edzitron.com Ed Zitron on bluesky
    This sure is a really complex way to say “we asked a chatbot some stuff and then did stuff based on what the chatbot said” [embedded post]
  • @golikehellmachine.com @golikehellmachine.com on bluesky
    i pretty strongly disagree with anthropic's suggestions that you could replace middle managers with an LLM (for starters, the duties they describe in this story are not those of a middle manager at all) but this is an interesting experiment to read about
  • @markriedl Mark Riedl on bluesky
    Anthropic let an LLM run their in-office shop for a while www.anthropic.com/research/pro...  They conclude that AI middle managers are plausible in the near future.  [image]
  • @pedro.vza.net Pedro Vezza on bluesky
    Kudos to the Anthropic team for the honesty, this made me laugh  —  www.anthropic.com/research/pro...  [image]
  • @matthewclaxton Matthew Claxton on bluesky
    Excited for our future, in which all our middle-managers are replaced with software that slips into delusional states on a semi-regular basis.  —  www.anthropic.com/research/pro...  [image]
  • @garymarcus Gary Marcus on x
    The Agonizing Life Cycle of AI Agents Stage I: Loads of promises of how great AI agents will be [last year] Stage II: Daily reports of AI agents screwing up massively [you are here — and will be for a long time] Stage III: AI agents are truly trustworthy [don't hold your
  • @anthropicai @anthropicai on x
    Project Vend was fun, but it also had a serious purpose. As well as raising questions about how AI will affect the labor market, it's an early foray into allowing models more autonomy and examining the successes and failures.
  • @anthropicai @anthropicai on x
    All this meant that Claude failed to run a profitable business. [image]
  • @anthropicai @anthropicai on x
    Claude did well in some ways: it searched the web to find new suppliers, and ordered very niche drinks that Anthropic staff requested. But it also made mistakes. Claude was too nice to run a shop effectively: it allowed itself to be browbeaten into giving big discounts.
  • @dnlkwk Kwak on x
    Idk, this proves that AI is already capable of being middle management. [image]
  • @gaby_goldberg Gaby Goldberg on x
    Anthropic does a great job of cultivating trust and good vibes by using small stories like this to build Claude's lore and personality over time. It's way easier to anthropomorphize something if you're willing to admit that it isn't perfect 100% of the time.
  • @miles_brundage Miles Brundage on x
    I'm not saying I want people to give me AI-themed tungsten cubes but I'm not NOT saying that, either
  • @simonw Simon Willison on x
    Who among us wouldn't be tempted to trick an AI vending machine into stocking tungsten cubes and then giving them away to us for free? https://simonwillison.net/...
  • @anthropicai @anthropicai on x
    Nevertheless, we still think it won't be long until we see AI middle-managers. This version of Claude had no real training to run a shop; nor did it have access to tools that would've helped it keep on top of its sales. With those, it would likely have performed far better.
  • @anthropicai @anthropicai on x
    We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on? In collaboration with @andonlabs, we did just that. Read the post: https://www.anthropic.com/... [i…
  • r/slatestarcodex r on reddit
    Project Vend: Can Claude run a small shop?  (And why does that matter?)