This help article provides a high level overview of the different AI models available for use when creating apps. The models cover capabilities like text generation, image generation, audio generation, code execution, and more.
For more details on using specific models, please see the other documentation and tutorial videos.
However, always Do Your Own Research to get a more complete understanding of each specific model.
LLM (Large Language Models)
OpenAI:
ChatGPT 3.5: An earlier version of OpenAI's conversational AI, capable of understanding and generating human-like text for a variety of applications.
ChatGPT 4: An advanced language model with improved reasoning, understanding, and generation capabilities, supporting more complex and nuanced interactions.
ChatGPT 4 Turbo: An optimized version of ChatGPT 4, offering faster response times and enhanced performance for real-time applications.
ChatGPT 4o (Omni): A multimodal model capable of processing both text and image inputs, enabling more comprehensive understanding and generation.
ChatGPT o1 Preview (Reasoning): An experimental model focusing on advanced reasoning tasks, designed to handle complex problem-solving scenarios.
ChatGPT o1 Mini (Reasoning): A lightweight version tailored for reasoning tasks, balancing performance with computational efficiency.
GPT Instruct: Fine-tuned to follow specific instructions, making it suitable for tasks requiring precise and directive outputs.
Claude (Anthropic):
Claude Instant v1: A streamlined version of Claude, designed for rapid responses in conversational AI applications.
Claude v2: An enhanced model with improved language understanding and generation, offering more accurate and context-aware interactions.
Claude v2.1: A refined update to v2, providing better accuracy, context handling, and overall performance.
Claude 3 Opus: The most advanced model in the Claude 3 family, excelling in complex tasks across various domains, including text and image processing.
Claude 3 Sonnet: Balances performance and speed, making it suitable for a wide range of applications requiring both skill and efficiency.
- Claude 3 Haiku: The fastest and most compact model in the Claude 3 family, designed for near-instant responsiveness, ideal for tasks like content moderation, inventory management, and quick translations.
- Claude 3.5 Sonnet: An improved version of Claude 3 Sonnet, offering enhanced performance for creative and conversational tasks.
Perplexity:
Llama 3.1 8B Instruct: An efficient model fine-tuned for following instructions, suitable for various directive-based tasks.
Llama 3.1 70B Instruct: A larger model designed to handle complex instructions with higher accuracy and depth.
Llama 3.1 Sonar Small Online: A lightweight model optimized for search queries and online information retrieval.
Llama 3.1 Sonar Large Online: A more robust version for comprehensive online search and data extraction tasks.
Llama 3.1 Sonar Small Chat: A compact conversational model tailored for quick and efficient interactions.
Llama 3.1 Sonar Large Chat: An advanced conversational model designed for detailed and context-rich dialogues.
Gemini (Google):
Gemini Pro 1.0: Google's initial professional-grade conversational AI, offering advanced language understanding and generation.
Gemini Pro 1.5: An improved version with enhanced context handling and multimodal capabilities, supporting both text and image inputs.
Groq:
Llama 3.1 8B: A high-performance, lightweight language model suitable for a range of natural language processing tasks.
Llama 3.1 70B: A larger model offering sophisticated language understanding for complex applications.
Llama 3 Groq 8B Tool Use: A specialized version fine-tuned for tasks involving API interactions, structured data manipulation, and complex tool use, excelling in function calling scenarios.
Llama 3 Groq 70B Tool Use: A scaled-up model with enhanced capabilities for advanced tool integration and utility tasks.
Meta Llama3 8B: A lightweight model from Meta's Llama 3 series, designed for efficient natural language processing tasks.
Meta Llama3 70B: A larger model in the Llama 3 series, offering advanced capabilities for complex language understanding and generation.
Mixtral 8x7B: A composite model combining multiple smaller models to achieve balanced performance across various tasks.
Gemma 7B: A specialized lightweight AI model tailored for specific applications requiring efficient processing.
Gemma 2 9B: An updated model providing more robust functionality for a broader range of tasks.
Image Generation
OpenAI:
- GPT-4 With Vision: Integrates text and image inputs, enabling complex multimodal queries and generating detailed responses based on visual and textual data.
DALLE:
DALLE-2: Generates creative and realistic images from textual descriptions, showcasing advanced understanding of language and imagery.
DALLE-3: An enhanced version with improved coherence, detail, and the ability to produce more complex and nuanced images from text prompts.
Stability.ai:
Stable Diffusion XL v1.0: A high-resolution image generation model capable of producing detailed and photorealistic images from text inputs.
- Stable Diffusion v3.0: A versatile text-to-image model designed for generating high-quality, creative outputs, with improved image clarity and style diversity.
- Upscale Image (ESGran x2 V1): An upscaling tool that enhances image resolution while preserving details and quality.
Scenario:
- Upscale Image: A tool for improving the resolution of images with AI-based processing, suitable for both artistic and practical purposes.
ImagineAPI (MidJourney):
- Text to Image: Converts textual prompts into stunning, imaginative visuals, popular among creators for its artistic touch.
PicsArt:
- Image Vectorizer: Transforms raster images into scalable vector graphics for design and illustration purposes.
- Remove Background: Quickly isolates subjects in images by removing backgrounds, useful for creating clean visuals.
GoAPI (MidJourney):
- Text to Image: Generates creative visuals from textual descriptions with a focus on artistry and detail.
- Describe Image: Analyzes and generates textual descriptions for images, making them searchable or accessible.
GoAPI (Stable Diffusion):
- Text to Image: Produces photorealistic or stylized images based on textual inputs, leveraging Stable Diffusion’s capabilities.
Ideogram:
- Generate Images V1 & V1 Turbo: Basic and faster versions of text-to-image models, suitable for general creative tasks.
- Generate Images V2 & V2 Turbo: Updated models offering more detail, better realism, and faster processing.
- Upscale Image: Enhances image resolution for clearer and more detailed visuals.
- Describe Image: Creates descriptive captions for images, useful for accessibility or organization.
Flux Text to Image:
- Flux 1.1 Pro: A professional-grade tool for generating high-quality images from text, with a focus on realism and detail.
- Flux 1 Pro: A standard version offering balanced speed and image quality.
- Flux 1 Dev: A developer-focused version for experimenting with image generation.
Audio / Video
RunwayML:
- Gen 3A Turbo (Image to Video): Converts images into short video animations, allowing creators to bring static visuals to life.
D-ID:
- Talking Avatar Creation: Creates AI-powered talking avatars, combining video synthesis with text-to-speech technologies.
Eleven Labs:
- Text to Speech - Eleven English v1, v2: Converts text to natural-sounding English speech, ideal for narration or dialogue creation.
- Text to Speech - Eleven Multilingual v1: Offers multilingual text-to-speech capabilities for a global audience.
- Text to Speech - Eleven Turbo v2, v2.5: Faster versions of Eleven Labs' TTS models, optimized for performance without sacrificing quality.
- Speech to Speech - Eleven English v2, Multilingual v2: Translates and replicates speech with natural intonation in English or multiple languages.
Kling AI:
- Video Generation: An AI model for creating short videos from textual descriptions or predefined templates.
Audio Transcription (Groq):
- Whisper V3 Large: Transcribes audio into text with high accuracy, supporting multiple languages and accents.
Audio Translation (Groq):
- Whisper V3 Large: Translates audio input into a target language while maintaining context and tone.
Others
Display:
- Rich Text Editor: A tool for creating and formatting text content with rich styling options.
- HTML Editor: Allows for direct coding or editing of HTML to customize layouts and designs.
Interactive:
- Pauses generation to get user input/selection before continuing.
Code:
- Provides coding capabilities for developing or refining applications directly in the tool.
URL Scraper:
- Extracts data from web pages by parsing their content, useful for research or automation tasks.
YouTube Transcript:
- Retrieves transcripts from YouTube videos, making them accessible for analysis or reference.
WebSearch (Exa):
- A search tool that scours the web for the latest information or resources.
Image Utility:
- Offers the ability to covert images to various formats.