Llama 3 currently has two model weights with parameters of 8B and 70B. (B stands for billion and represents how complex the model is and how well the model understands the training.) So far we have only provided text-based responses, but these Meta says it’s a “huge improvement” compared to the version. Llama 3 showed more variety in responding to prompts, had fewer false refusals to answer questions, and was able to reason better. Mehta also said that Llama 3 now understands more instructions and writes better code than before.
In the post Meta claims that both Llama 3 sizes outperformed similarly sized models such as Google’s Gemma, Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmark tests. In the MMLU benchmark, which measures general knowledge, Llama 3 8B performed significantly better than both Gemma 7B and Mistral 7B, and Llama 3 70B slightly outperformed Gemini Pro 1.5.
(It is perhaps noteworthy that Meta’s 2,700-word post does not mention OpenAI’s flagship model, GPT-4.)
Benchmarking your AI model can help you understand how powerful your AI model is, but it’s also important to note that it’s imperfect. It turns out that the dataset used to benchmark the model is part of the training of the model. This means that the model already knows the answers to the questions the evaluator asks.
says meta Human raters also rated Llama 3 higher than other models such as OpenAI’s GPT-3.5. Meta says it has created a new dataset for human evaluators to emulate real-world scenarios in which Llama 3 might be used. This dataset included use cases such as requesting advice, summarizing, and creative writing. The company said the team working on the model did not have access to this new evaluation data and it did not affect the model’s performance.
“This assessment set includes 1,800 prompts covering 12 key use cases: Seeking Advice, Brainstorming, Classification, Answering Closed Questions, Coding, Creative Writing, Extraction, Character / Existence of Personas, Answering Open Questions, Reasoning, Rewriting, and Summarizing,” Mehta said in a blog post.
Llama 3 has a larger model size (able to understand longer instruction strings and data) and is capable of more multimodal responses such as “generate images” and “transcribe audio files.” It is expected. Mehta said that these larger versions, which have more than 400B parameters and can ideally learn more complex patterns than smaller versions of the model, are currently being trained, but initial performance testing shows that these models are It has been shown that many of the questions posed by can be answered.
However, Meta did not release previews of these larger models or compare them to other larger models such as GPT-4.