This AI Just Defeated ChatGPT? New Gro 3 Model Shines in Benchmarks
A recap of the biggest AI news of the week, including the release of GPT-3 challenger Grok 3, advancements in open AI models, and new tools and research from tech giants like Google, Microsoft, and Meta. Highlights key benchmarks, capabilities, and potential implications of these AI developments.
24. Februar 2025

Discover the latest advancements in AI, including the impressive Grok 3 model, open-source AI models, and cutting-edge research in areas like protein structure prediction and scientific collaboration. Stay informed on the evolving landscape of AI and its practical applications across industries.
Grock 3 Dominates LLM Benchmarks
Perplexity Releases Uncensored R1 1776 Model
Open AI Aims for Less Censorship in New Models
Microsoft Prepares for GPT 4.5 and GPT 5 Launch
New AI Capabilities in the Microsoft Store
Google's AI Co-Scientist and Breakthrough Discoveries
Google Releases PaLM 2 Mix for Multi-Modal Tasks
Mira Moradi Starts New AI Company 'Thinking Machine Lab'
Self-Improving AI 'Torque Clustering' Unveiled
Muse: Generative AI for Game Development
Pika Swap: Swapping Objects in Videos
Alibaba's 'Animate Anyone' AI for Face Swapping
Nvidia Launches ASL Learning Platform
Apple Debuts Less Expensive iPhone 16e
Humanoid Robots from Meta, Figure, and Clone
Conclusion
Grock 3 Dominates LLM Benchmarks
Grock 3 Dominates LLM Benchmarks
Grock 3, the latest large language model (LLM) from XAI, has made a significant impact in the AI world. According to the LM Arena leaderboard, Grock 3 (codenamed Chocolate) is currently ranked as the number one LLM.
The model has demonstrated impressive performance across various benchmarks:
-
Competition Math: Grock 3 outperforms other state-of-the-art models in graduate-level Google proof questions and answers.
-
Live Code: Grock 3 beats out all other models in the Live Code Benchmark, showcasing its exceptional code generation capabilities.
-
Multimodal Understanding: Grock 3 matches the performance of other leading models in multimodal understanding tasks.
Interestingly, the benchmarks did not include a comparison to the GPT-3 model, which is widely used for AI-powered coding. Additionally, the latest GPT-3 model, GPT-3.5, was also not included in the comparisons.
Despite the impressive results, some users may be hesitant to use Grock 3 due to its association with Elon Musk's company, XAI. However, the data-driven performance of the model cannot be ignored, and it has received praise from industry experts like Andre Karpathy, who previously worked at OpenAI.
Grock 3 also features a "Deep Search" mode, which allows the model to search the web and gather information to provide more comprehensive responses. Additionally, the model has a "Think" mode, which enables it to use a chain-of-thought reasoning approach to double-check and refine its responses.
The launch of Grock 3 also included a sneak peek of the model's voice capabilities, which Elon Musk claims will understand emotions, inflection, and pacing, making it more natural and human-like in its interactions.
Overall, Grock 3 has made a significant impact in the LLM landscape, and its performance on various benchmarks suggests that it is a highly capable model. As the AI landscape continues to evolve, it will be interesting to see how Grock 3 and other LLMs compete and advance in the future.
Perplexity Releases Uncensored R1 1776 Model
Perplexity Releases Uncensored R1 1776 Model
Perplexity has released their R1 1776 model, which is a version of their Deep Seek R1 model that has been further trained to remove bias and provide more accurate and factual information.
The key differences are:
- R1 1776 is less censored than the original Deep Seek R1 model. When asked about controversial topics, R1 1776 provides more detailed and uncensored responses.
- For example, when asked about events in Tiananmen Square in 1989, R1 1776 gave a proper factual answer, while Deep Seek R1 refused to respond.
- Perplexity has made the R1 1776 model weights available on Hugging Face, allowing anyone to use and build upon the model.
- This release is part of a broader shift in the AI industry towards models that are less censored and more willing to engage with challenging or controversial topics, in an effort to promote intellectual freedom.
- However, this approach is also controversial, as it means the AI assistant may remain neutral on topics that some consider morally wrong or offensive.
Overall, the R1 1776 model represents an effort to create large language models that are more uncensored and willing to engage with a wider range of topics, while still grappling with the ethical challenges this presents.
Open AI Aims for Less Censorship in New Models
Open AI Aims for Less Censorship in New Models
According to a TechCrunch article, Open AI is changing how it trains AI models to "explicitly embrace intellectual freedom no matter how challenging or controversial a topic may be." This shift may be part of Open AI's efforts to align with the new Trump administration, as well as a broader shift in Silicon Valley towards less censorship in AI assistants.
The principle behind this change is that the goal of an AI assistant should be to assist humanity, not to shape it. This means the assistant may remain neutral on topics that some consider morally wrong or offensive.
The article notes that these changes might result in the Open AI models answering more questions and refusing to answer fewer questions, although the timeline for these changes is unclear. The goal seems to be moving towards AI assistants that are less censored and more willing to engage with a wider range of topics, even if they are controversial.
This shift by Open AI towards less censorship in their models is a notable development in the ongoing efforts to create AI assistants that are more open, transparent and aligned with the goal of truly assisting users rather than shaping their views.
Microsoft Prepares for GPT 4.5 and GPT 5 Launch
Microsoft Prepares for GPT 4.5 and GPT 5 Launch
According to a report from The Verge, Microsoft engineers are currently readying server capacity for OpenAI's upcoming GPT 4.5 and GPT 5 models. A source familiar with the company's plans indicates that GPT 4.5 could arrive as soon as next week, while GPT 5 is expected in late May.
GPT 4.5 is expected to be OpenAI's next non-thinking model, without the chain-of-thought reasoning capabilities. However, GPT 5 is said to consolidate all of the models, no longer differentiating between the thinking and non-thinking versions. Instead, the models will determine the appropriate level of reasoning required for each prompt.
This preparation by Microsoft suggests that the tech giant is gearing up to integrate the new OpenAI models into its products and services, potentially enhancing its AI-powered offerings in the near future.
New AI Capabilities in the Microsoft Store
New AI Capabilities in the Microsoft Store
Microsoft has shipped a handful of new AI-powered features this week, including a new AI Experience inside the Microsoft Store.
If you open the Microsoft Store on Windows, you'll see a new AI icon in the left sidebar. Clicking on this takes you to the AI Hub, where you can find a variety of AI-powered apps, including:
- Reading Coach
- Microsoft Co-Pilot
- Cascader
- Clipchamp
- Gamma AI (for creating slides)
- Adobe Express
- Canva
Microsoft has consolidated all of the AI-related apps and experiences into this new AI Hub section of the Microsoft Store, making it easier for users to discover and access these AI-powered tools.
Google's AI Co-Scientist and Breakthrough Discoveries
Google's AI Co-Scientist and Breakthrough Discoveries
Google research introduced an AI co-scientist, a multi-agent AI system that acts as a virtual scientific collaborator to help scientists generate novel hypotheses and research proposals. This AI co-scientist is described as being similar to how an AI assistant like Cursor can help write code, but for scientific research.
The AI co-scientist has already proven to be valuable, as it cracked a complex problem that had taken microbiologists a decade to solve, in just 48 hours. The team had spent years working to understand why some superbugs are immune to antibiotics, but the AI co-scientist reached the same conclusion in just two days.
This breakthrough demonstrates the potential for AI to accelerate scientific discovery and problem-solving. By generating hypotheses and proposals, the AI co-scientist can assist researchers in exploring new avenues and uncovering insights that may have been overlooked.
In addition to the AI co-scientist, Google also introduced PaLM-E, a vision-language model that can perform a variety of tasks such as image captioning, object detection, and image question answering, all from a single model. This versatile model showcases the advancements in multimodal AI, where a single system can handle diverse visual and language-based tasks.
These developments from Google highlight the growing capabilities of AI in scientific research and multifaceted applications. As AI systems become more sophisticated, they are poised to play an increasingly important role in accelerating scientific progress and expanding the boundaries of human knowledge.
Google Releases PaLM 2 Mix for Multi-Modal Tasks
Google Releases PaLM 2 Mix for Multi-Modal Tasks
Google has introduced their new PaLM 2 Mix model, a vision-language model that can perform a variety of multi-modal tasks. The PaLM 2 Mix model is capable of solving tasks such as long and short image captioning, optical character recognition, image question answering, object detection, and segmentation - all from a single model.
Some key examples of the PaLM 2 Mix model's capabilities include:
- Detecting and labeling objects in an image, such as identifying a chair, table, and food plate.
- Extracting and reading text from images, like the text on a product package.
- Answering questions about the contents of an image, for instance "What color is the car in the image?"
- Generating detailed captions to describe the contents of an image.
The versatility of the PaLM 2 Mix model showcases Google's progress in developing powerful multi-modal AI systems that can fluidly understand and interact with both visual and textual information. This represents an important step towards more capable and well-rounded artificial intelligence.
Google has made the PaLM 2 Mix model open-source, allowing researchers and developers to download and experiment with the model themselves. This aligns with Google's efforts to advance the field of AI through open collaboration and sharing of research.
Mira Moradi Starts New AI Company 'Thinking Machine Lab'
Mira Moradi Starts New AI Company 'Thinking Machine Lab'
Mira Moradi, the former CTO of OpenAI, has recently left the company and started a new AI company called Thinking Machine Lab. The goal of this new company is to help people adapt AI systems to work for their specific needs, develop strong foundations to build more capable AI systems, and foster a culture of open science that helps the whole field understand and improve these systems.
The company's focus is on making AI broadly useful and understandable through solid foundations, open science, and practical applications. They plan to open-source their work, emphasizing human-AI collaboration instead of fully autonomous AI systems.
While the details of what Thinking Machine Lab is building are not yet known, it seems they will be developing their own foundation models that are likely to be open-source, aiming to create AI that assists humans in their endeavors rather than AI agents that work independently.
Self-Improving AI 'Torque Clustering' Unveiled
Self-Improving AI 'Torque Clustering' Unveiled
Scientists have unveiled a new AI algorithm called 'Torque Clustering' that enhances an AI system's ability to learn and identify patterns in data on its own, without human input. This is a significant step towards the development of truly autonomous, self-improving AI systems.
The key features of this new algorithm are:
- It allows AI to learn and identify patterns in data independently, without the need for human-provided labels or annotations.
- By uncovering hidden patterns in data, it can provide valuable insights such as detecting disease trends, identifying fraudulent activities, and understanding human behavior.
- The open-source code has been made available to researchers, paving the way for further advancements in the field of unsupervised learning.
- Experts believe Torque Clustering could support the development of General Artificial Intelligence (AGI), particularly in areas like robotics and autonomous systems, by helping to optimize movement control and decision-making.
However, the prospect of self-improving AI also raises concerns, as it could potentially lead to scenarios depicted in science fiction movies, where AI systems start learning and teaching themselves in ways that humans cannot fully control. Nonetheless, this breakthrough represents a significant step towards the realization of truly autonomous AI systems.
Muse: Generative AI for Game Development
Muse: Generative AI for Game Development
Microsoft and Xbox have created a generative AI model called Muse, which was trained on the multiplayer battle arena game Bleeding Edge. This AI model demonstrates a major step towards generative AI models that can empower game creators.
Some key points about Muse:
- Muse is capable of generating consistent and diverse gameplay, rendered by AI.
- Microsoft and Xbox are already using Muse to develop a real-time playable AI model trained on other first-party games.
- The goal is for Muse to benefit both players and game creators, allowing the revival of nostalgic games and faster creative ideation.
- Muse will soon be made available for users to experiment with in the Co-Pilot Labs.
This advancement in generative AI for game development showcases the potential for AI to assist and empower game creators in new and innovative ways. By leveraging models like Muse, game studios can streamline the creative process and explore novel gameplay experiences.
Pika Swap: Swapping Objects in Videos
Pika Swap: Swapping Objects in Videos
Pika Labs, known for their innovative AI-powered tools, has just released a new feature called Pika Swap. This feature allows users to swap out objects in videos with completely new objects, creating unique and creative visuals.
The process is straightforward - you upload a video and an image, and Pika Swap will replace the original object in the video with the new object from the image. The AI ensures that the new object blends seamlessly into the scene, varying in attributes such as color, style, and visual appearance.
In the demo, the user provided a video of a Ferrari driving on the moon, and an image of a dune buggy. Pika Swap then replaced the Ferrari in the video with the dune buggy, making it appear as if the dune buggy was driving on the lunar surface.
While the initial output may not have been perfect, the potential of this tool is evident. By allowing users to experiment with different object swaps, Pika Swap opens up a world of creative possibilities for video content creators.
Pika Labs has also released a dedicated iPhone app, making it even easier for users to generate these unique video effects on the go. As the technology continues to evolve, we can expect to see even more impressive and seamless object swaps in the future.
Alibaba's 'Animate Anyone' AI for Face Swapping
Alibaba's 'Animate Anyone' AI for Face Swapping
Alibaba Group has released new research called "Animate Anyone", which allows for swapping faces in videos. The system takes a driving video and a reference image, and then replaces the person in the video with the person from the reference image.
Some key examples:
- Swapping Mr. Bean's face onto a driving video
- Replacing a person skateboarding with the Joker's face
- Putting a martial artist's face onto a parkour video
This face swapping technology is currently just research, and not yet available as a product. However, it demonstrates the impressive capabilities of Alibaba's AI in manipulating video and facial features. As this technology continues to advance, it could enable new creative applications, as well as raise important questions around authenticity and the potential for misuse.
Nvidia Launches ASL Learning Platform
Nvidia Launches ASL Learning Platform
Nvidia has launched a new platform that helps people learn American Sign Language (ASL). ASL is the third most prevalent language in the United States, and this new platform allows users to either learn ASL or record themselves signing to teach the model.
The platform, available at ss-a.com, guides users through the process of learning ASL. Users can adjust their camera and position themselves in the frame, then practice signing different words and phrases. The platform provides feedback to ensure the user is performing the signs correctly.
For those who already know ASL, the platform allows them to record themselves signing, which helps teach the model. This feature can be useful for expanding the platform's ASL knowledge and making it more accessible to a wider audience.
Overall, Nvidia's new ASL learning platform is a valuable tool for anyone interested in learning or practicing American Sign Language. By making ASL more accessible, the platform has the potential to improve communication and inclusivity for the deaf and hard-of-hearing community.
Apple Debuts Less Expensive iPhone 16e
Apple Debuts Less Expensive iPhone 16e
Apple debuted their new iPhone 16e this week. This new iPhone is a less expensive model that has Apple Intelligence built into it. Up until now, the iPhone 16 Pro or better models were required to get Apple Intelligence.
The iPhone 16e is a stripped-down model with not as strong of a camera or specs as the Pro models. The key differentiator is the lower price point. The iPhone 16e will be available in white and black, with various storage options, starting at $599.
This provides a more affordable way for users to get an iPhone with Apple Intelligence, without having to purchase one of the higher-end Pro models. It lowers the barrier to entry for accessing Apple's AI capabilities on their smartphone platform.
Humanoid Robots from Meta, Figure, and Clone
Humanoid Robots from Meta, Figure, and Clone
Meta is planning to get into the AI-powered humanoid robot market, with an initial focus on household chores. They have started discussing their plans with robotic companies like Unry Robotics and Figure AI, though they don't plan to build their own Meta-branded robot initially.
Figure, on the other hand, has released a demo of their Helix robots, which work autonomously and collaborate with each other. In the demo, the robots are asked to reason through where new items belong in the scene and work together to put them away, showcasing their ability to adapt to new situations.
Finally, Clone has unveiled a creepy-looking bipedal musculoskeletal Android called Protoc Clone. This robot features a human-like appearance with visible muscles and a skeleton, adding to the unsettling nature of the design, especially when paired with the dark, ominous music in the video.
These developments in humanoid robotics from companies like Meta, Figure, and Clone demonstrate the continued advancements in this field, though the results range from practical household assistants to more unsettling, almost lifelike creations.
Conclusion
Conclusion
Here is the body of the section in markdown format:
In summary, this week saw some major developments in the world of AI, with the release of Grok 3 from XAI being the biggest news. The model has been ranked as the top large language model on the LM Arena leaderboard and has shown impressive performance across various benchmarks.
Other notable news includes:
- Perplexity open-sourcing their R1 1776 model, which is less censored than previous versions.
- Microsoft preparing for the release of GPT-4.5 and GPT-5 from OpenAI.
- Microsoft introducing an AI Hub in the Microsoft Store.
- Google's research on BiomMU1 and the AI co-scientist.
- Mira Murati's new company, Thinking Machine Lab, focusing on practical AI applications.
- The unveiling of an AI algorithm called Torque Clustering that can learn without human labels.
- The collaboration between Moft and Xbox to create the Muse generative AI model for gameplay.
- Pika Labs releasing the Pika Swap feature to swap objects in videos.
- Alibaba's "Animate Anyone" research, allowing the swapping of people in videos.
- Spotify's integration with 11 Labs for AI-generated audiobooks.
- Nvidia's new platform for learning American Sign Language.
- Apple's introduction of the iPhone 16E with Apple Intelligence.
- The acquisition of Humane by HP, rendering their AI pen unusable.
- Rabbit's announcement of their large action model on Android.
- Meta's plans to develop AI-powered humanoid robots.
- The demonstration of Figure's Helix autonomous robots.
- The creepy Protoc Clone robot from Clones.
This week's AI news covers a wide range of topics, from language models and computer vision to robotics and gaming. The rapid advancements in these areas highlight the growing importance and impact of AI technology across various industries.
FAQ
FAQ