Keyword-Topic Categorizer Python Script

$0+
0 ratings

Overview:

The Keyword-Topic Categorizer is an advanced script designed to categorize a list of keywords based on their relevance to a set of predefined topics. Whether you have a dataset of search terms from your website or a list of product keywords and want to categorize them under specific topics, this tool is tailored to streamline the process.

Features:

  1. Multilingual Support: Categorizes keywords in English, German, French, Spanish, Russian, and Chinese.
  2. Dynamic Model Loading: Uses the appropriate language model based on detected keyword language for optimal accuracy.
  3. Batch Processing: Efficiently processes large datasets in manageable batches.
  4. Debug Mode: Provides detailed insights into the categorization process for each keyword.

Installation:

Prerequisites:

  • Python 3.x
  • pip (Python package installer)

Steps:

  1. Clone or download the repository containing the script.
  2. Navigate to the script's directory in the terminal or command prompt.
  3. Install the required Python packages using the following commands:
    pip install pandas tqdm spacy langdetect
  4. Download the necessary spaCy language models:
    python -m spacy download en_core_web_sm 
    python -m spacy download de_core_news_sm 
    python -m spacy download es_core_news_sm 
    python -m spacy download fr_core_news_sm 
    python -m spacy download ru_core_news_sm 
    python -m spacy download zh_core_web_sm


Usage:

  1. Prepare Your Data:
    • Ensure you have two text files: keywords.txt (one keyword per line) and topics.txt (one topic per line).
    • Place both files in the same directory as the script.
  2. Run the Script:
    • Navigate to the script's directory in the terminal or command prompt.
    • Execute the script:
      python match.py
  3. View Results:
    • Once the script completes its execution, a file named results.csv will be generated in the same directory. This file contains two columns: "keyword" and "category". Each keyword from keywords.txt is paired with its closest matching topic from topics.txt.
  4. Debug Mode (Optional):
    • If you wish to see a detailed breakdown of how each keyword is categorized, set the DEBUG_MODE variable in the script to True. When you run the script in this mode, it will print diagnostic information for each keyword.

Note: This tool provides categorizations based on token overlaps between keywords and topics. It's essential to ensure that your topics are representative of the categories you want to create. The script defaults to English when a keyword's language can't be determined or if the language isn't supported. Adjustments may be necessary based on specific use cases or domain-specific requirements.

$
I want this!
Size
3.51 KB
Copy product URL
$0+

Keyword-Topic Categorizer Python Script

0 ratings
I want this!