/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

@artificialanlys

@artificialanlys
66 posts
2026-03-04
Google has released Gemini 3.1 Flash-Lite Preview!...Key takeaways: ➤ Improved intelligence over Gemini 2.5 Flash-Lite: @GoogleDeepMind's Gemini 3.1 Flash-Lite Preview scores 34 on the Artificial Analysis Intelligence Index, up 12 points from Gemini 2.5 Flash-Lite (09-25).  However, Gemini 3.1 Flash-Lite Preview had limited gains in tool use capabilities, matching Gemini 2.5 Flash-Lite (09-25) on Tau2-Telecom with 31%, and scoring 958 on GDPval-AA, 12 points behind gpt-oss-120b (high) ➤ Leading speed and latency: Gemini 3.1 Flash-Lite Preview maintains the same high speeds and low latency as Gemini 2.5 Flash-Lite (09-25), measuring at over 360 output tokens/s, with an average answer latency of 5.1s.  To measure latency for reasoning models, we use time to first answer token, which accounts for both prefill processing, and thinking time
2026-03-04 View on X
The Keyword

Google launches Gemini 3.1 Flash-Lite, which it says delivers “enhanced performance at a fraction of the cost of larger models” and outperforms Gemini 2.5 Flash

Get best-in-class intelligence for your highest-volume workloads. … Today, we're introducing Gemini 3.1 Flash-Lite …

2026-03-03
Google has released Gemini 3.1 Flash-Lite Preview! This upgrades the fastest, lowest-cost Gemini model series, scoring 34 on the Artificial Analysis Intelligence Index while served at over 360 output tokens/sec, significantly faster than other first-party API endpoints Key [image]
2026-03-03 View on X
The Keyword

Google launches Gemini 3.1 Flash-Lite, which it says delivers “enhanced performance” at a fraction of the cost of larger models and outperforms 2.5 Flash

Get best-in-class intelligence for your highest-volume workloads. … Today, we're introducing Gemini 3.1 Flash-Lite …

2026-02-21
Google is once again the leader in AI: Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index, 4 points ahead of Claude Opus 4.6 while costing less than half as much to run @GoogleDeepMind gave us pre-release access to Gemini 3.1 Pro Preview.  It leads 6 of the 10 evaluations that make up the Artificial Analysis Intelligence Index and improves significantly over Gemini 3 Pro Preview across capabilities, with the biggest gains in reasoning and knowledge, coding, and hallucination reduction.
2026-02-21 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

2026-02-20
Google is once again the leader in AI: Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index, 4 points ahead of Claude Opus 4.6 while costing less than half as much to run @GoogleDeepMind gave us pre-release access to Gemini 3.1 Pro Preview.  It leads 6 of the 10 evaluations that make up the Artificial Analysis Intelligence Index and improves significantly over Gemini 3 Pro Preview across capabilities, with the biggest gains in reasoning and knowledge, coding, and hallucination reduction.
2026-02-20 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

In November, Google introduced Gemini 3 Pro in preview, with Gemini 3 Flash following a month later.

2026-02-19
Google is once again the leader in AI: Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index, 4 points ahead of Claude Opus 4.6 while costing less than half as much to run @GoogleDeepMind gave us pre-release access to Gemini 3.1 Pro Preview. It leads 6 of the [image]
2026-02-19 View on X
9to5Google

Google rolls out Gemini 3.1 Pro, which it says is “a step forward in core reasoning”, for all users in the Gemini app; the .1 increment is a first for Google

In November, Google introduced Gemini 3 Pro in preview, with Gemini 3 Flash following a month later.

2026-02-18
The performance and token use increases for Claude Sonnet 4.6 mean that it is now clustered with Opus 4.6 on the ELO vs. Cost to Run curve despite 40% lower per token prices Sonnet is back at the Pareto frontier, but now positioned at a higher cost and performance point while [image]
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

Claude Sonnet 4.6 is the new leader in GDPval-AA, slightly ahead of Anthropic's Opus 4.6 on agentic performance of real-world knowledge work tasks less than two weeks after its launch In our pre-release testing with @AnthropicAI, Sonnet 4.6 reached an ELO of 1633 using the [image]
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

Claude Sonnet 4.6 substantially improves on the aesthetic capabilities of Sonnet 4.5 for tasks like presentation and document generation in GDPval-AA. While we see effective analysis, and in some cases content similarities, between the two versions, the visual elements are [image]
2026-02-18 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, computer use, instruction following, and more; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2026-02-17
Claude Sonnet 4.6 substantially improves on the aesthetic capabilities of Sonnet 4.5 for tasks like presentation and document generation in GDPval-AA. While we see effective analysis, and in some cases content similarities, between the two versions, the visual elements are [image]
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

Claude Sonnet 4.6 is the new leader in GDPval-AA, slightly ahead of Anthropic's Opus 4.6 on agentic performance of real-world knowledge work tasks less than two weeks after its launch In our pre-release testing with @AnthropicAI, Sonnet 4.6 reached an ELO of 1633 using the [image]
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

The performance and token use increases for Claude Sonnet 4.6 mean that it is now clustered with Opus 4.6 on the ELO vs. Cost to Run curve despite 40% lower per token prices Sonnet is back at the Pareto frontier, but now positioned at a higher cost and performance point while [image]
2026-02-17 View on X
Anthropic

Anthropic launches Claude Sonnet 4.6 with improvements in coding, consistency, and more, for Free and Pro users; it features a 1M token context window in beta

Claude Sonnet 4.6 is our most capable Sonnet model yet.  It's a full upgrade of the model's skills across coding, computer use …

2026-02-12
GLM-5 is on the Pareto curve of the Intelligence vs. Cost to Run the Intelligence Index chart driven by lower per token pricing compared to proprietary peers (e.g. Claude Opus, Google Gemini and OpenAI GPT-5.2) - GLM-5 cost ~$547 (based on the median per token price of [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

GLM-5 uses fewer output tokens than GLM-4.7 to run the Artificial Analysis Intelligence Index [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

GLM-5 is the new leading open weights model! GLM-5 leads the Artificial Analysis Intelligence Index amongst open weights models and makes large gains over GLM-4.7 in GDPval-AA, our agentic benchmark focused on economically valuable work tasks GLM-5 is @Zai_org's first new [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

GLM-5 demonstrates improvement in AA-Omniscience Index, driven by lower hallucination. This means the model is abstaining more from answering questions it does not know [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

GLM-5 is the new leading open weights model! GLM-5 leads the Artificial Analysis Intelligence Index amongst open weights models and makes large gains over GLM-4.7 in GDPval-AA, our agentic benchmark focused on economically valuable work tasks GLM-5 is @Zai_org's first new [image]
2026-02-12 View on X
Z.ai

Z.ai launches GLM-5, saying its flagship open-weight model has “best-in-class performance among all open-source models” in reasoning, coding, and agentic tasks

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks.  Scaling is still one of the most important ways …

GLM-5 is on the Pareto curve of the Intelligence vs. Cost to Run the Intelligence Index chart driven by lower per token pricing compared to proprietary peers (e.g. Claude Opus, Google Gemini and OpenAI GPT-5.2) - GLM-5 cost ~$547 (based on the median per token price of [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

GLM-5 uses fewer output tokens than GLM-4.7 to run the Artificial Analysis Intelligence Index [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

GLM-5 demonstrates improvement in AA-Omniscience Index, driven by lower hallucination. This means the model is abstaining more from answering questions it does not know [image]
2026-02-12 View on X
Reuters

Z.ai says it will raise prices by at least 30% for new GLM coding plan subscribers to accommodate surging demand for its AI coding tools

2026-01-27
Moonshot's Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier - with only OpenAI, Anthropic and Google models ahead Key takeaways: ➤ Impressive performance on agentic tasks: @Kimi_Moonshot's Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA [image]
2026-01-27 View on X
Kimi

Moonshot says Kimi K2.5 builds on K2 with “pretraining over ~15T mixed visual and text tokens” and “can self-direct an agent swarm with up to 100 sub-agents”

Today, we are introducing Kimi K2.5, the most powerful open-source model to date.