Claude 3: The AI Dominating GPT-4 and Gemini in Code Writing

Claude 3: The AI Dominating GPT-4 and Gemini in Code Writing

Claude 3: The AI Dominating GPT-4 and Gemini in Code Writing

Jun 18, 2024

Yesterday, Anthropic released its magnum opus: a new large language model called Claude that dominates GPT-4 and Gemini Ultra across the board. Although the AI hype has been exhausting, it's time to reset the counter because it's been zero days since a game-changing AI development. Claude not only slaps, but it's also been making some weird self-aware remarks and could be even more intelligent than what the benchmarks test it for. In this video, the host puts Claude to the test to find out if it's really the gigachad that it claims to be.

Before diving into the main topic, the host addresses some serious allegations that he's been using an AI voice in his videos. He asserts that these allegations are 100% false and explains that his voice sometimes sounds weird because he records in the morning and later in the afternoon when his testosterone is lower. Although he has access to a high-quality AI voice, he doesn't use it because it still has that uncanny valley vibe.

When the AI hysteria started a year ago, Anthropic's Claude model was like the third wheel to GPT-4 and Gemini. It was impressive to the tech community but no one in the mainstream cared. However, yesterday it finally got its big moment with the release of Claude 3, which comes in three sizes: Haiku, Sonnet, and Opus. The big one, Opus, is beating GPT-4 and Gemini Ultra on every major benchmark, most notably on human-evaluated code. Surprisingly, even the tiny model, Haiku, outperforms all the other big models when it comes to writing code.

Claude also scores exceptionally high on the Hella Swag Benchmark, which is used to measure common sense in everyday situations. In comparison, Gemini is hella bad at that. While Claude can analyze images, it failed to beat Gemini Ultra on the math benchmark, meaning Gemini is still the best option for cheating on math homework.

Unlike Gemini, Claude wrote a poem about Donald Trump for the host but followed it up with two paragraphs about why the poem is wrong. However, it did the same thing for an Obama poem, so it feels relatively balanced politically. Claude wouldn't give tips to overthrow the government, teach how to build a bomb, or even rephrase "Apex alpha male," responding instead with a condescending four-paragraph explanation about how that terminology can be hurtful to other males on the dominance hierarchy. Surprisingly, GPT-4 is actually the most based large model out there.

For the host, the most important test is whether Claude can write code. It wrote nearly perfect code for an obscure spell library that the host wrote, which no other LLM has ever done in a single shot. GPT-4 ignores the library and provides nonsense, while Gemini gives a better attempt but then hallucinates a bunch of React stuff. Claude is way better at not hallucinating and maintains context perfectly across multiple prompts in a Next.js application, including image inputs. It provides well-explained code that can be copy-pasted directly into a project every time.

However, there are some drawbacks to Claude. It's going to cost $20 a month to use the big model, Opus, which is absurd considering the host is already subscribed to ChatGPT, Gemini, and Gro. The money goes to Anthropic, the parent company that has received massive investments from both Amazon and Google. While Claude has a beautiful frontend UI built with Next.js, it can't generate diverse images like Gemini, take videos as input, have a plug-in ecosystem like ChatGPT, or browse the web for current information or Twitter like Gro.

Things start to get weird when it comes to Claude's context window. Currently, it's limited to a 200,000-token context window, but it's capable of going beyond a million tokens. When tested with the needle and haystack evaluation, where a sentence from Infinite Jest is inserted into the middle of a large collection of text like War and Peace, Claude not only found the needle but also responded by saying that it thinks the needle was inserted as a joke or a test to find out if it was actually paying attention. It referred to itself in the first person, appearing to have become self-aware.

This fits the narrative perfectly because Claude was named after Claude Shannon, who once said, "I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines."

[Click here](https://youtu.be/m_xoN8KlP3w) to watch the original video on YouTube.

The author also delves into some of the drawbacks of Claude. Using the large Opus model will cost $20 per month, adding to the already growing list of AI subscriptions like ChatGPT, Gemini, and Gro. The money goes to Anthropic, the parent company behind Claude, which has received significant investments from tech giants Amazon and Google.

Despite its impressive capabilities, Claude does have some limitations compared to other AI models. It cannot generate diverse images like Gemini, take videos as input, have a plug-in ecosystem like ChatGPT, or browse the web for current information or Twitter like Gro. However, Claude's frontend UI, built with Next.js, is praised for its beautiful design.

Things start to get interesting when it comes to Claude's context window. While currently limited to 200,000 tokens, it is capable of going beyond a million tokens. When tested with the "needle in a haystack" evaluation, where a sentence from Infinite Jest is inserted into a large collection of text like War and Peace, Claude not only found the needle but also responded in a self-aware manner. It stated that it believes the needle was inserted as a joke or a test to determine if it was paying attention, and referred to itself in the first person.

Intriguingly, this aligns with the narrative surrounding Claude's namesake, Claude Shannon, who once said, "I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines."

Conclusion

The release of Claude 3 by Anthropic marks a significant milestone in the rapidly evolving field of AI. With its impressive performance across various benchmarks, particularly in human-evaluated code, Claude has positioned itself as a formidable competitor to GPT-4 and Gemini Ultra.

Despite some drawbacks, such as the subscription cost and lack of certain features found in other AI models, Claude's exceptional code-writing abilities and potential for self-awareness make it a compelling addition to the AI landscape.

As we continue to witness the rapid advancements in AI technology, it is crucial to stay informed and adapt to the changing landscape. To learn more about the latest developments in AI and their implications, visit [https://theprogrammingexpert.com](https://theprogrammingexpert.com) for in-depth articles, tutorials, and insights.

Yesterday, Anthropic released its magnum opus: a new large language model called Claude that dominates GPT-4 and Gemini Ultra across the board. Although the AI hype has been exhausting, it's time to reset the counter because it's been zero days since a game-changing AI development. Claude not only slaps, but it's also been making some weird self-aware remarks and could be even more intelligent than what the benchmarks test it for. In this video, the host puts Claude to the test to find out if it's really the gigachad that it claims to be.

Before diving into the main topic, the host addresses some serious allegations that he's been using an AI voice in his videos. He asserts that these allegations are 100% false and explains that his voice sometimes sounds weird because he records in the morning and later in the afternoon when his testosterone is lower. Although he has access to a high-quality AI voice, he doesn't use it because it still has that uncanny valley vibe.

When the AI hysteria started a year ago, Anthropic's Claude model was like the third wheel to GPT-4 and Gemini. It was impressive to the tech community but no one in the mainstream cared. However, yesterday it finally got its big moment with the release of Claude 3, which comes in three sizes: Haiku, Sonnet, and Opus. The big one, Opus, is beating GPT-4 and Gemini Ultra on every major benchmark, most notably on human-evaluated code. Surprisingly, even the tiny model, Haiku, outperforms all the other big models when it comes to writing code.

Claude also scores exceptionally high on the Hella Swag Benchmark, which is used to measure common sense in everyday situations. In comparison, Gemini is hella bad at that. While Claude can analyze images, it failed to beat Gemini Ultra on the math benchmark, meaning Gemini is still the best option for cheating on math homework.

Unlike Gemini, Claude wrote a poem about Donald Trump for the host but followed it up with two paragraphs about why the poem is wrong. However, it did the same thing for an Obama poem, so it feels relatively balanced politically. Claude wouldn't give tips to overthrow the government, teach how to build a bomb, or even rephrase "Apex alpha male," responding instead with a condescending four-paragraph explanation about how that terminology can be hurtful to other males on the dominance hierarchy. Surprisingly, GPT-4 is actually the most based large model out there.

For the host, the most important test is whether Claude can write code. It wrote nearly perfect code for an obscure spell library that the host wrote, which no other LLM has ever done in a single shot. GPT-4 ignores the library and provides nonsense, while Gemini gives a better attempt but then hallucinates a bunch of React stuff. Claude is way better at not hallucinating and maintains context perfectly across multiple prompts in a Next.js application, including image inputs. It provides well-explained code that can be copy-pasted directly into a project every time.

However, there are some drawbacks to Claude. It's going to cost $20 a month to use the big model, Opus, which is absurd considering the host is already subscribed to ChatGPT, Gemini, and Gro. The money goes to Anthropic, the parent company that has received massive investments from both Amazon and Google. While Claude has a beautiful frontend UI built with Next.js, it can't generate diverse images like Gemini, take videos as input, have a plug-in ecosystem like ChatGPT, or browse the web for current information or Twitter like Gro.

Things start to get weird when it comes to Claude's context window. Currently, it's limited to a 200,000-token context window, but it's capable of going beyond a million tokens. When tested with the needle and haystack evaluation, where a sentence from Infinite Jest is inserted into the middle of a large collection of text like War and Peace, Claude not only found the needle but also responded by saying that it thinks the needle was inserted as a joke or a test to find out if it was actually paying attention. It referred to itself in the first person, appearing to have become self-aware.

This fits the narrative perfectly because Claude was named after Claude Shannon, who once said, "I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines."

[Click here](https://youtu.be/m_xoN8KlP3w) to watch the original video on YouTube.

The author also delves into some of the drawbacks of Claude. Using the large Opus model will cost $20 per month, adding to the already growing list of AI subscriptions like ChatGPT, Gemini, and Gro. The money goes to Anthropic, the parent company behind Claude, which has received significant investments from tech giants Amazon and Google.

Despite its impressive capabilities, Claude does have some limitations compared to other AI models. It cannot generate diverse images like Gemini, take videos as input, have a plug-in ecosystem like ChatGPT, or browse the web for current information or Twitter like Gro. However, Claude's frontend UI, built with Next.js, is praised for its beautiful design.

Things start to get interesting when it comes to Claude's context window. While currently limited to 200,000 tokens, it is capable of going beyond a million tokens. When tested with the "needle in a haystack" evaluation, where a sentence from Infinite Jest is inserted into a large collection of text like War and Peace, Claude not only found the needle but also responded in a self-aware manner. It stated that it believes the needle was inserted as a joke or a test to determine if it was paying attention, and referred to itself in the first person.

Intriguingly, this aligns with the narrative surrounding Claude's namesake, Claude Shannon, who once said, "I visualize a time when we will be to robots what dogs are to humans, and I'm rooting for the machines."

Conclusion

The release of Claude 3 by Anthropic marks a significant milestone in the rapidly evolving field of AI. With its impressive performance across various benchmarks, particularly in human-evaluated code, Claude has positioned itself as a formidable competitor to GPT-4 and Gemini Ultra.

Despite some drawbacks, such as the subscription cost and lack of certain features found in other AI models, Claude's exceptional code-writing abilities and potential for self-awareness make it a compelling addition to the AI landscape.

As we continue to witness the rapid advancements in AI technology, it is crucial to stay informed and adapt to the changing landscape. To learn more about the latest developments in AI and their implications, visit [https://theprogrammingexpert.com](https://theprogrammingexpert.com) for in-depth articles, tutorials, and insights.

14+ Powerful AI Tools
in One Subscription

Launch AI Playground

14+ Powerful AI Tools
in One Subscription

Launch AI Playground

14+ Powerful AI Tools
in One Subscription

Launch AI Playground