Use Meta series model construction

Written by

Iris Vance

Updated on:July-08th-2025

Introduction

This course will cover:

• Explore the two main Meta series models - Llama 3.1 and Llama 3.2• Understand the use cases and scenarios for each model• Code examples that showcase the unique capabilities of each model

Meta Series Models

In this course, we'll explore two models in the Meta series or "Llama family" - Llama 3.1 and Llama 3.2.

These models come in different variations and can be found on the GitHub Model Marketplace. Below are more details on using GitHub Models for AI model prototyping.

Model variants:

• Llama 3.1 - 70B Instruct• Llama 3.1 - 405B Instruct• Llama 3.2 - 11B Vision Instruct• Llama 3.2 - 90B Vision Instruct

NOTE: Llama 3 is also available on the GitHub model but is not covered in this course.

Llama 3.1

Llama 3.1 has 405 billion parameters and is in the category of open source LLMs.

This mode is an upgrade to the earlier released Llama 3, providing:

• Larger context window - 128k tokens vs 8k tokens• Larger maximum output tokens - 4096 vs 2048• Better multi-language support - due to the addition of training tokens

These enable Llama 3.1 to handle more complex use cases when building GenAI applications, including:

• Local Function Calling - the ability to call external tools and functions outside of the LLM workflow• Better RAG performance - due to higher context window• Synthetic data generation - the ability to create valid data for tasks such as fine-tuning

Local function call

Llama 3.1 has been fine-tuned to make function or tool calls more efficient. It also has two built-in tools that the model can recognize based on user prompts. These tools are:

• Brave Search - Get the latest information, such as weather, by performing web searches• Wolfram Alpha - can be used for more complex mathematical calculations, so there is no need to write your own functions.

You can also create your own custom tools that the LLM can call.

In the following code example:

• We define the available tools (brave_search, wolfram_alpha) in the system prompt.• Send a user alert asking about weather conditions in a specific city.• LLM will respond with a tool call to the Brave Search tool as shown below

<|python_tag|>brave_search.call(query="Stockholm weather")

Note: This example only makes a tool call, if you want to get the results you need to create a free account on the Brave API page and define the function itself.

import osfrom azure.ai.inference import ChatCompletionsClientfrom azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessagefrom azure.core.credentials import AzureKeyCredential
token = os.environ[ "GITHUB_TOKEN" ]endpoint =  "https://models.inference.ai.azure.com"model_name =  "meta-llama-3.1-405b-instruct"
client = ChatCompletionsClient(    endpoint=endpoint,    credential=AzureKeyCredential(token),)

tool_prompt=f """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Environment: ipythonTools: brave_search, wolfram_alphaCutting Knowledge Date: December 2023Today Date: 23 July 2024
You are a helpful assistant<|eot_id|>" ""
messages = [    SystemMessage(content=tool_prompt),    UserMessage(content= "What is the weather in Stockholm?" ),
]
response = client.complete(messages=messages, model=model_name)
print(response.choices[0].message.content)

Llama 3.2

Although Llama 3.1 is an LLM, one of its limitations is multimodality. That is, the ability to use different types of input, such as images as prompts and provide responses. This capability is one of the main features of Llama 3.2. These features also include:

• Multimodal – able to assess both textual and visual cues• Small and medium variants (11B and 90B) - this provides flexible deployment options• Plain text variants (1B and 3B) - this allows the model to be deployed on edge/mobile devices and provides low latency

Multimodal support represents a big step forward in the field of open source models. The following code example takes both image and text prompts to obtain an image analysis from Llama 3.2 90B.

Multimodal support in Llama 3.2

import  osfrom  azure.ai.inference  import  ChatCompletionsClientfrom  azure.ai.inference.models  import  (    SystemMessage,    UserMessage,    TextContentItem,    ImageContentItem,    ImageUrl,    ImageDetailLevel,)from  azure.core.credentials  import  AzureKeyCredential
token = os.environ[ "GITHUB_TOKEN" ]endpoint =  "https://models.inference.ai.azure.com"model_name =  "Llama-3.2-90B-Vision-Instruct"
client = ChatCompletionsClient(    endpoint=endpoint,    credential=AzureKeyCredential(token),)
response = client.complete(    messages=[        SystemMessage(            content= "You are a helpful assistant that describes images in details."        ),        UserMessage(            content=[                TextContentItem(text= "What's in this image?" ),                ImageContentItem(                    image_url = ImageUrl.load(                        image_file = "sample.jpg" ,                        image_format = "jpg" ,                        detail=ImageDetailLevel.LOW)                ),            ],        ),    ],    model=model_name,)
print (response.choices[ 0 ].message.content)