• Starter AI
  • Posts
  • Claude overtakes GPT-4o, Wayve’s PRISM-1, Are LLMs lying to us?

Claude overtakes GPT-4o, Wayve’s PRISM-1, Are LLMs lying to us?

This time, Anthropic takes the lead.

Hello, Starters!

We might believe that OpenAI is safe at the number one spot in the AI race; after all, they've got the best models, right? Well, let's take a closer look, as here comes Anthropic.

Here’s what you’ll find today:

  • Anthropic unveils Claude 3.5 Sonnet

  • Wayve presents PRISM-1

  • How to tell if a LLM is prone to confabulation

  • Embracer Group’s approach to AI

  • DeepMind reveals AI isn’t a great comedian

  • And more.

Anthropic got in its hands what could be the most capable model in the industry yet. Claude 3.5 Sonnet has officially been released, and the benchmarks are leaving most competitors behind, including GPT-4o in certain aspects. The greatest part? All this potential is also cost-effective, which appeals to Anthropic's main market, businesses.

Claude 3.5 Sonnet has a 200k context window and is the company's most powerful vision model. It features a new option that grants users more control over generated content called "Artifact." Claude's latest iteration is available for free on Claude.ai and the Claude iOS app.

London-based startup Wayve is working to deliver the missing piece autonomous driving needs to reach new heights. They've introduced PRISM-1, an AI model capable of reconstructing three-dimensional urban scenarios using vehicle camera footage. The representations are highly realistic and elaborate, including moving elements like cyclists, pedestrians, traffic lights, and more.

Wayve already has "Ghost Gym," a driving simulator, which, mixed with PRISM-1, can further improve how the startup trains and tests driving models, bringing its innovation to a variety of vehicle platforms and camera setups in the future.

Large language models often give answers without having all the facts right. A group of researchers from the University of Oxford has named this quirk "confabulation," which is a mix of correct but arbitrary or incorrect answers to a specific query. To quickly identify these "errors" and improve how LLMs work, they've come up with a method called "semantic entropy."

Through this approach, they gather a series of possible answers to a question, which they later group to calculate semantic entropy. A high level of entropy leads to confabulations, while a lower level produces correct responses.

🕹️Embracer Group, a Swedish video game company, is giving insights into its approach to AI in its annual report, claiming that they don't want the technology to replace humans. Instead, they think AI can “empower” developers and open more pathways to game creation. Also, they consider it a way of staying competitive in the field.

😅A recent study by Google DeepMind stated that AI isn't funny enough to be a comedian. The research, which also involved 20 professional comedians, leveraged tools like ChatGPT and the former Google Bard to create a series of jokes. The result? Stereotypical, boring, and outdated. So, for now, AI must be restrained from doing standup.

What did you think of today's newsletter?

Login or Subscribe to participate in polls.

Thank you for reading!