Use Meta series model construction

Explore the power of Meta series models and master the application of Llama 3.1 and Llama 3.2 in AI model prototyping.
Core content:
1. A detailed introduction to Meta series models Llama 3.1 and Llama 3.2
2. Model variants and their use cases in different scenarios
3. Code examples show the unique features of the model, including local function calls and synthetic data generation
Introduction
This course will cover:
• Explore the two main Meta series models - Llama 3.1 and Llama 3.2• Understand the use cases and scenarios for each model• Code examples that showcase the unique capabilities of each model
Meta Series Models
In this course, we'll explore two models in the Meta series or "Llama family" - Llama 3.1 and Llama 3.2.
These models come in different variations and can be found on the GitHub Model Marketplace. Below are more details on using GitHub Models for AI model prototyping.
Model variants:
• Llama 3.1 - 70B Instruct• Llama 3.1 - 405B Instruct• Llama 3.2 - 11B Vision Instruct• Llama 3.2 - 90B Vision Instruct
NOTE: Llama 3 is also available on the GitHub model but is not covered in this course.
Llama 3.1
Llama 3.1 has 405 billion parameters and is in the category of open source LLMs.
This mode is an upgrade to the earlier released Llama 3, providing:
• Larger context window - 128k tokens vs 8k tokens• Larger maximum output tokens - 4096 vs 2048• Better multi-language support - due to the addition of training tokens
These enable Llama 3.1 to handle more complex use cases when building GenAI applications, including:
• Local Function Calling - the ability to call external tools and functions outside of the LLM workflow• Better RAG performance - due to higher context window• Synthetic data generation - the ability to create valid data for tasks such as fine-tuning
Local function call
Llama 3.1 has been fine-tuned to make function or tool calls more efficient. It also has two built-in tools that the model can recognize based on user prompts. These tools are:
• Brave Search - Get the latest information, such as weather, by performing web searches• Wolfram Alpha - can be used for more complex mathematical calculations, so there is no need to write your own functions.
You can also create your own custom tools that the LLM can call.
In the following code example:
• We define the available tools (brave_search, wolfram_alpha) in the system prompt.• Send a user alert asking about weather conditions in a specific city.• LLM will respond with a tool call to the Brave Search tool as shown below
<|python_tag|>brave_search.call(query="Stockholm weather")
Note: This example only makes a tool call, if you want to get the results you need to create a free account on the Brave API page and define the function itself.
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
token = os.environ[ "GITHUB_TOKEN" ]
endpoint = "https://models.inference.ai.azure.com"
model_name = "meta-llama-3.1-405b-instruct"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(token),
)
tool_prompt=f """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 23 July 2024
You are a helpful assistant<|eot_id|>
" ""
messages = [
SystemMessage(content=tool_prompt),
UserMessage(content= "What is the weather in Stockholm?" ),
]
response = client.complete(messages=messages, model=model_name)
print(response.choices[0].message.content)
Llama 3.2
Although Llama 3.1 is an LLM, one of its limitations is multimodality. That is, the ability to use different types of input, such as images as prompts and provide responses. This capability is one of the main features of Llama 3.2. These features also include:
• Multimodal – able to assess both textual and visual cues• Small and medium variants (11B and 90B) - this provides flexible deployment options• Plain text variants (1B and 3B) - this allows the model to be deployed on edge/mobile devices and provides low latency
Multimodal support represents a big step forward in the field of open source models. The following code example takes both image and text prompts to obtain an image analysis from Llama 3.2 90B.
Multimodal support in Llama 3.2
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import (
SystemMessage,
UserMessage,
TextContentItem,
ImageContentItem,
ImageUrl,
ImageDetailLevel,
)
from azure.core.credentials import AzureKeyCredential
token = os.environ[ "GITHUB_TOKEN" ]
endpoint = "https://models.inference.ai.azure.com"
model_name = "Llama-3.2-90B-Vision-Instruct"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(token),
)
response = client.complete(
messages=[
SystemMessage(
content= "You are a helpful assistant that describes images in details."
),
UserMessage(
content=[
TextContentItem(text= "What's in this image?" ),
ImageContentItem(
image_url = ImageUrl.load(
image_file = "sample.jpg" ,
image_format = "jpg" ,
detail=ImageDetailLevel.LOW)
),
],
),
],
model=model_name,
)
print (response.choices[ 0 ].message.content)