Santiago

Posted on May 21

Turning Text into Audio: A Journey with Text2AudioBook

#programming #python #opensource #automation

1. Introduction

As an avid audiobook listener, I always wondered how my own writing would sound if narrated. This curiosity turned into a necessity as I penned my latest book. Realizing the limitations of existing tools, particularly their 4000-token cap set by ChatGPT for each audio file, I embarked on creating Text2AudioBook, a tool that transforms entire chapters into seamless audio files. The process of manually copying and pasting text of 50,000 to 100,000 tokens (equivalent to 30,000–70,000 words or 100–250 pages) into smaller chunks was tedious and time-consuming, which motivated me to develop a better solution.

2. Background

Traditional methods for converting text to audio often come with significant limitations, such as a restrictive token limit that disrupts the continuity of the narration. Imagine the frustration of having to split a lengthy manuscript into numerous small segments. This process not only disrupts the flow of the text but also consumes valuable time that could be better spent on creative tasks.

3. Introducing Text2AudioBook

What is Text2AudioBook? It's a tool that eliminates the limitations of token counts by parsing any amount of text into manageable chunks. These chunks are then converted into audio and seamlessly stitched together, creating a smooth and continuous narration.

Key Benefits/Advantages

No Token Limit: Unlike other tools that impose a 4000-token limit, Text2AudioBook can handle any length of text. This is particularly useful for authors and researchers who need to convert entire chapters or lengthy documents.
Automated Parsing and Stitching: The tool automatically parses the text into smaller segments, converts each segment into audio, and stitches them together into a coherent audio file. This automation saves time and reduces manual effort.
User-Friendly Python Implementation: The project is implemented in Python, a language known for its readability and ease of use. This makes the tool accessible to a wide range of users, from beginners to advanced programmers.
Reliable Audio Conversion: By leveraging ChatGPT for text-to-audio conversion, the tool benefits from continuous improvements in AI technology, ensuring high-quality audio output.
Simple GUI with Tkinter: The graphical user interface is built using Tkinter, a standard Python library for creating GUIs. This makes the tool easy to use, even for those who are not familiar with coding.

Practical Example/Case Study

Imagine an author working on their latest novel. They want to hear how a particular chapter sounds when read aloud to better gauge its impact and flow. With Text2AudioBook, they can simply input the entire chapter, and the tool will convert it into a single, coherent audio file. This allows the author to listen to their work in one sitting, making it easier to identify areas for improvement.

Additionally, consider someone who wants to study a topic but lacks an audio recording of the material. They could convert the text to audio and listen to it while driving or walking. Living in Los Angeles, I know how bad traffic can be, often turning commutes into hour-long ordeals. I'd rather reclaim that time by learning something new than just listening to the radio.

6. Development Journey

The creation of Text2AudioBook was not straightforward. I went through several iterations, scrapping each one as I encountered issues. Initially, I tried to do it manually, but adding new features was slow and cumbersome. I then experimented with agent workflows, but many were immature, buggy, or unsupported. Recently, with the release of ChatGPT-4.0, coding became much easier. I utilized it to clean up my code and add new features like logging and a basic GUI. This iterative process greatly improved the final product.

7. Key Function: convert_text_chunk_to_speech

One of the core functions of Text2AudioBook is convert_text_chunk_to_speech. This function takes a chunk of text, sends it to ChatGPT for conversion into speech, and handles the resulting audio file. Here's a brief look at the code:

import openai
import pydub
from pydub import AudioSegment
import requests
import io

def convert_text_chunk_to_speech(text_chunk, api_key):
    openai.api_key = api_key
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=text_chunk,
        max_tokens=4000
    )
    audio_url = response['choices'][0]['text'].strip()

    # Download the audio file from the URL
    audio_file = requests.get(audio_url)

    # Convert the audio file to a format suitable for stitching
    audio_segment = AudioSegment.from_file(io.BytesIO(audio_file.content), format="mp3")
    return audio_segmentpython

This function illustrates how the tool breaks down a large text into manageable chunks and converts each chunk into an audio segment. These segments are then seamlessly stitched together to create a cohesive audio file.

Audio Quality and Format: The function ensures high-quality audio output by converting the audio file into a suitable format for stitching using the pydub library. This ensures that the final audio is clear and consistent.

Error Handling: Robust error handling mechanisms are crucial for managing issues like API failures or network errors. Future enhancements could include retries for failed requests and detailed logging for debugging purposes.

**Performance Optimization: **The function is optimized for performance by using efficient libraries like requests and pydub. These libraries handle large amounts of data quickly and reliably, making the process smooth and efficient.

Integration with GUI: The function integrates seamlessly with the Tkinter GUI, providing a user-friendly interface for inputting text and generating audio. Users can interact with the tool through the GUI, making the process accessible even for those with minimal coding experience.

Future Enhancements: Future versions may include additional features such as real-time progress updates and customizable audio settings, enhancing user experience and functionality.

8. Impact on Writing Approach

Text2AudioBook has significantly impacted my writing approach. It's not just about saving time; it's about enhancing my understanding of my own work. Listening to my book allows me to grasp the nuances of my writing better. I often edit sections after hearing them in audio format, leading to more refined and effective prose. This auditory review process has become an integral part of my editing routine, improving both the quality and coherence of my writing.

9. Future Plans

The journey doesn't stop here. Future versions of Text2AudioBook will include the ability to prepare video files. This feature will append images to the audio, making it suitable for platforms like YouTube. By adding a visual element, the tool will make the content more engaging and accessible to a broader audience.

10. Conclusion

Text2AudioBook has transformed the way I approach my writing. It's freed me from the tedious task of splitting text into small segments and provided a seamless way to convert entire chapters into audio. My hope is that others will find it equally useful, whether for personal projects or as a preliminary step before hiring professional narrators. I believe this tool will help many discover new ways to enjoy and review their written work.

11. Explore Text2AudioBook
I encourage you to try Text2AudioBook for your next project by checking out the Text2AudioBook GitHub repository, whether you're an author, researcher, or anyone with a large text to convert, as this tool can save you time and enhance your workflow.

DEV Community

Turning Text into Audio: A Journey with Text2AudioBook

Top comments (0)

Read next

Staying Up-to-Date with the Latest AI Developments and Trends

AWS open source newsletter, #198

How to Pass Data Between Components in React

The Hilarious Guide to Career Sabotage