• Starter AI
  • Posts
  • Microsoft’s DenseAV, Introducing: Driving robots, Luma AI’s Dream Machine

Microsoft’s DenseAV, Introducing: Driving robots, Luma AI’s Dream Machine

The focus is on AI videos.

Hello, Starters!

As we're patiently waiting for Sora, video generation continues to grow as one of the trendiest fields in AI development. And we're not just stopping there; videos are also a main asset to training upcoming AI models. We live in cinematic times!

Here’s what you’ll find today:

  • DenseAV: Learning language from videos

  • Humanoid robots are driving

  • Luma AI presents “Dream Machine”

  • OpenAI and Oracle team up

  • AI can help us understand dogs!

  • And more.

Microsoft Research and Mark Hamilton, an MIT Ph.D. student, have unveiled DenseAV, an algorithm capable of learning and understanding language and locations just by listening to sounds obtained from unlabeled videos. Basically, the model trains itself through millions of videos and can figure out what the footage is about without human intervention. DenseAV doesn't require text prompts to do so.

DenseAV's capabilities allow it to distinguish certain sounds from visual objects. For example, separating the image of a dog from the sound of a bark could be difficult due to the relation between the word and the sound. However, DenseAV can separate sound features and language features without prior knowledge.

Autonomous driving is a concept that still needs improvement, and a team of researchers from the University of Tokyo thinks that humanoid robots may be the solution once and for all. They've developed a "musculoskeletal humanoid" and trained it to drive a small electric car on a test track.

The robot, called Musashi, mimics the human body and can rotate the car's keys, turn the steering wheel, and press the brake pedal. In a successful demo, it turned a corner at an intersection. Researchers believe there's still work to do, but they're committed to developing a next-gen robot and software.

The arrival of Sora has prompted a surge in the video generation community, and while it isn't yet available to the public, even more startups are showing off their offerings. Luma Labs is one of them and has recently released Dream Machine, a model built on a multimodal transformer architecture capable of creating high-quality videos from text prompts and images.

Dream Machine is currently available for testing, and the results are outstanding, allowing users to mix a series of camera movements for cinematic outputs. Although the video length is limited to five seconds, Dream Machine seems like a worthy competitor to OpenAI's Sora.

💥OpenAI's CEO, Sam Altman, has been candid about the company's need for more compute capacity. In addition to their current partnership with Microsoft, they have announced a collaboration with Oracle, whose chips will be used to scale OpenAI's endeavours and power ChatGPT's increasing demand.

🐶A research effort between the University of Michigan and Mexico's National Institute of Astrophysics, Optics, and Electronics is leveraging AI to decipher the meaning behind a dog's barking and determine whether it is playful or aggressive. They're using models trained on human speech to develop new systems for understanding animal communication.

What did you think of today's newsletter?

Login or Subscribe to participate in polls.

Thank you for reading!