• Starter AI
  • Posts
  • Google unveils Gemini Live, SWE-bench gets upgraded, Anthropic’s Prompt Caching

Google unveils Gemini Live, SWE-bench gets upgraded, Anthropic’s Prompt Caching

OpenAI needs to hurry.

Hello, Starters!

When you realise that companies are currently competing with one another to come up with breakthroughs faster, you understand that calling it the "AI race" isn't just a catchy headline.

Here’s what you’ll find today:

  • Gemini Live comes to transform smartphones

  • OpenAI introduces “SWE-bench Verified”

  • Anthropic presents Prompt Caching

  • There’s a new GPT-4o model for ChatGPT

  • MIT’s AI risk repository

  • And more.

As the wait for OpenAI's advanced Voice Mode and Apple's revamp of Siri continues, Google has decided to take the lead with Gemini Live. This mobile AI assistant takes over your phone and allows you to have a full-on conversational experience where you can ask questions, command it to set reminders, find emails, play music, and more, even when your phone is locked, as it keeps running in the background.

This is not the first time that Google has taken an advantage over other competitors; they've already made it clear in their conferences that infusing AI into their products is one of their main goals. And, unlike OpenAI, they've already started rolling out Gemini Live to Gemini Advanced subscribers on Android, with iOS coming up in the next few weeks.

SWE-bench is a popular benchmark in the AI field, as it tests models' capabilities in solving software issues found on GitHub, a key part of their development as autonomous systems. As part of their Preparedness Framework project, OpenAI has worked with the authors of SWE-bench to update the benchmark and make it more reliable and accurate.

This is how SWE-bench Verified was born. This upgraded benchmark is a subset of SWE-bench's original test set, and it includes 500 verified samples that have been catalogued as non-problematic by a group of software developers.

Anthropic has launched "Prompt Caching" for developers in its API. This feature allows users to cache a context they're frequently using in case they need to look back at that information constantly in other requests. With this approach, they're reducing costs by up to 90%, and it is already available in beta for Claude 3.5 Sonnet and Claude 3 Haiku.

As Anthropic points out, prompt caching has proven to be useful in situations where large amounts of context are involved, such as conversational agents, coding assistants, detailed instruction sets, and more.

🤖Last week, OpenAI launched a version of GPT-4o optimised for API usage, but now they've also disclosed that there's an updated version of the model powering ChatGPT, and that's been tested on LMSYS as "anonymous-chatbot," showcasing great performance at tasks like coding, instruction-following, and hard prompts.

📚A group of researchers at MIT is currently working on an AI "risk repository," a library or database aimed at companies, individuals, or even governments as a guideline to design regulations for AI technologies, as every aspect in which AI could be used is filled with very different kinds of risks, which should be cautiously analysed.

What did you think of today's newsletter?

Login or Subscribe to participate in polls.

Thank you for reading!