In this tutorial video for PromptForm, we explore how to enhance our app by adding a valuable feature—a text-to-speech capability that converts articles into audio or podcast versions. This feature allows users to listen to content directly on your site or distribute it across various podcast platforms, generating backlinks to your app. The video provides a detailed walkthrough on how to set up the text-to-speech prompts using Eleven Labs' multilingual model and offers insights into selecting voices, structuring content, and ensuring proper functionality with required inputs. Additionally, the tutorial covers the handling of long-form text to ensure smooth and seamless processing. By the end of the video, users will know how to generate and download their own podcasts directly from the articles produced by PromptForm.


Topics Covered:

1. Introduction to Audio/Podcast Feature:

  • Transform written articles into audio/podcast format.

2. Setting Up Prompts:

  • Navigating to the prompt section after content creation.
  • Adding and naming a new prompt.

3. Selecting Text-to-Speech Model:

  • Choosing "eleven labs text to speech multilingual v2".
  • Previewing and selecting the desired voice for the audio.

4. Configuring Text Inputs:

  • Adding titles and sections of the blog post.
  • Including backlinks by adding a prompt for the website URL.

5. Making Inputs Required:

  • Ensuring all necessary inputs are provided for proper functionality.

6. Generating the Podcast:

  • Adding instructions for the text-to-speech to mention the original posting website.
  • Enabling long text processing for articles exceeding 10,000 tokens.

7. Final Steps:

  • Running the prompt to create the audio file.
  • Downloading the generated audio file from the interface.


By following the steps outlined in this video, users can seamlessly integrate audio and podcast creation capabilities into their PromptForm applications, enabling them to reach wider audiences and enhance user engagement.


1. Building Your First App (17:56)
2. Interactive Elements (5:20)
3. Image Generation (4:28)
4. Text to Speech (4:19)
5. Audio & Transcription (5:01)
6. Conditional Inputs (2:38)
7. RAG Documents (7:46)