/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Source: OpenAI engineers earlier this month told some colleagues they had figured out a way to more than halve the cost of inference

We closely track efforts by Anthropic, Google and OpenAI to get access to more server chips to run their models.  But we don't talk enough about the work …

The Information Stephanie Palazzolo

Discussion

  • @steph_palazzolo Stephanie Palazzolo on x
    OpenAI engineers earlier this month developed an optimization that cut inference costs in half for models it was applied to. After the optimization was applied to logged-out ChatGPT traffic, it reduced the number of GPUs needed to power that traffic to a couple hundred. [image]
  • r/singularity r on reddit
    OpenAI has reportedly found a way to cut inference costs in half