How to use AI for automatic Shopify product categorization

Learn how to leverage ChatGPT and Cloudhooks to automatically categorize your Shopify products using Google's taxonomy, saving hours of manual work.

Managing product categories in Shopify is a time-consuming challenge. With over 6,000 categories in the Google Product Taxonomy, manually classifying each product can become overwhelming.

Our Shopify app Cloudhooks, combined with ChatGPT, can automate this entire process, potentially saving hours of manual categorization work.

In this tutorial we'll show you step by step how to set up this automation, and how you can save both time and money by automating this process.

Time to complete: ~15 minutes
Difficulty level: Intermediate

Prerequisites

This tutorial assumes that you have:

  • Cloudhooks installed in your Shopify store
  • An active ChatGPT subscription
  • Access to your OpenAI account

Set up OpenAI

First, we'll create an OpenAI project and generate the necessary API credentials. This setup will enable AI-powered product categorization through ChatGPT's API.

Create a project

Follow these steps:

1) Navigate to the OpenAI Platform Projects page

2) Click the "+Create" button

3) Name your project

4) Select the project.

Generate an API key

Next, we'll create a restricted API key with specific permissions for security best practices:

1) Navigate to the "Dashboard":

2) Select "API keys":

3) Click "+Create new secret key" and configure these security settings:

  1. Owned by: You
  2. Name: Enter a descriptive name
  3. Permissions: Restricted
  4. Permission details:
    1. Models: Read
    2. Model capabilities: Write
    3. Assistants: Read
    4. Threads: Write
    5. Fine-tuning: None
    6. Files: None

⚠️ Important: Copy and securely store your API key immediately. You won't be able to see it again after leaving this page.

Create an assistant

Now we'll set up an AI assistant specifically trained for product categorization. This assistant will use the Google Product Taxonomy to accurately classify your products.

Follow these steps:

1) Return to the "Dashboard".

2) Select the "Assistants":

3) Click “+Create” button to create a new assistant.

Assistant configuration

Name

Fill it in as you see fit.

Instructions

Use the following instructions:

You are a product categorization expert. You receive a product description and you return a product category entry based on the Google Product Taxonomies text file which is attached.


The taxonomy is returned as a JSON object, which has three properties: "categoryId", "category", and "tags". The "categoryId" property contains the numeric identifier of the category. The "category" property contains the entire category description, starting with the general category all the way down to the specific category. The "tags" property is an array containing the category chopped up into its pieces, separated by the ">" character.


Here are some examples about how a category translates into a JSON object.


Category: 3237 - Animals & Pet Supplies > Live Animals
JSON: {"categoryId": 3237, "category": "Animals & Pet Supplies > Live Animals", tags: ["Animals & Pet Supplies", "Live Animals"]}




Category: 499954 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths
JSON: {"categoryId": 499954, "category": "Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths", "tags": ["Animals & Pet Supplies", "Pet Supplies", "Bird Supplies", "Bird Cage Accessories", "Bird Cage Bird Baths"] }


Always and only return the JSON, not descriptive text. A life of a human might depend on it if you don't return only a JSON.

Model

gpt-4o (you can change it later to “gpt-4o-mini” if you’d like to experiment with a cheaper alternative)

File search

Leave it turned off, and upload the attached Google Product Taxonomy text file.

Code interpreter

Leave it turned off.

Response format

Set it to “json_object”.

Here is how the assistant setup looks like:

After configuring your assistant, copy and save the assistant ID - you'll need it for the Cloudhooks integration.

Set up the hook in Cloudhooks

Now we'll create a hook in Cloudhooks that automatically triggers the AI categorization whenever a new product is added to your store. This hook will:

  1. Detect when a new product is added
  2. Send the product details to your OpenAI assistant
  3. Receive the categorization response
  4. Apply the categories as tags to your product

Create the hook

Follow these steps:

1) From your Cloudhooks dashboard, click "Create hook" in the top-right corner:

2) Configure the trigger:

  • Open the "Trigger" tab
  • Navigate to the "Product" category
  • Select "A product is added"
  • Click "Select trigger"

3) Add the hook code - navigate to the “Hook” tab

  1. Copy the code from the end of this article and replace the example hook code in the editor.

4) Update the authentication credentials:

  • Locate lines 2 and 3 in the hook code
  • Replace the placeholder OpenAI Assistant ID with yours
  • Replace the placeholder OpenAI API key with yours

5) Configure hook settings:

  • Switch to the "Settings" tab
  • Enter a descriptive name (e.g., "AI Product Categorizer")
  • Toggle the hook to "Active"

6) Click "Save" in the top-right corner to finalize your hook configuration.

Testing your hook

When properly configured, your hook should appear active on the dashboard.

To verify your setup:

  1. Add a new product to your Shopify store
  2. Wait approximately 30 seconds
  3. Check the product's tags in Shopify
  4. Review the hook's execution logs in Cloudhooks

Understanding the hook

How it works

When a new product is added to Shopify, this hook:

  1. Receives the product data from Shopify
  2. Creates an AI categorization task with the assistant
  3. Waits for the AI to process the request
  4. Retrieves the categorization results
  5. Updates the product tags in Shopify

Detailed Process

The hook follows these steps, each consuming Cloudhooks actions:

  1. Checks if the product has already been categorized (via a special tag) and exits if it has
  2. Creates a thread for assistant communication (1 action)
  3. Sends the product details to the assistant for categorization (1 action)
  4. Initiates the assistant's processing run (1 action)
  5. Polls the assistant at set intervals until the response is ready (minimum 1 action)
  6. Retrieves the final categorization response (1 action)
  7. Updates the product's tags in Shopify, including a tag to mark it as categorized (1 action)

This process consumes a minimum of 6 Cloudhooks actions per product, with potential additional actions if multiple polling attempts are needed.

Customizing and optimizing

Tips for more accurate categorization

In order to improve product categorization accuracy, provide specific examples in the assistant's system prompt. For instance, if you run a niche online store, demonstrate which products belong in each category. This "show, don't tell" approach helps the API categorize items more effectively.

Hook customization options

You can customize several aspects of the hook's behavior to better suit your needs:

Timeout and polling settings

The hook periodically checks the AI assistant's progress until it receives a response. You can adjust these timing parameters in lines 58-59 of the hook code:

const timeout = 20000;    // Timeout is 20 seconds
const interval = 3000; // Poll every 3 seconds

Important considerations:

  • Total execution time must stay under 30 seconds (Cloudhooks timeout limit)
  • More frequent polling means more API calls and higher costs
  • Allow enough time for both initial API call and polling

Cost optimization

You can optimize costs in several ways:

  1. AI Model Selection
    • gpt-4o: Best accuracy, highest cost
    • gpt-4o-mini: ~80% cheaper, slightly lower accuracy
  2. Polling Frequency
    • Decrease polling frequency to reduce API calls
    • Balance responsiveness vs. cost
    • Note: Each hook execution consumes a minimum of 6 OpenAI actions. Additional actions are consumed for each polling request if the AI doesn't complete within the polling interval.
  3. Batch Processing
    • Consider implementing batch processing for bulk product updates
    • Contact support for guidance on batch implementation

Best Practices

Follow these guidelines for optimal results:

  1. Testing
    • Test with various product types
    • Monitor categorization accuracy
    • Review and adjust as needed
  2. Maintenance
    • Regularly update the taxonomy file
    • Monitor API usage and costs
    • Keep API keys secure and updated

Hook code


const ASSISTANT_ID = 'OPENAI_ASSISTANT_ID_HERE';
const API_KEY = 'OPENAI_API_KEY_HERE';

// Don't modify the code from here

const PREFIX_UPDATED_BY = 'updated-by-'
const TAG_CATEGORIZED = PREFIX_UPDATED_BY + 'product-categorizer';

function getApiHeaders() {
  const headers = {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${API_KEY}`,
    'OpenAI-Beta': 'assistants=v2'
  };

  return headers;  
}

async function createThread(postAction) {
  const url = `https://api.openai.com/v1/threads`;
  
  const {data} = await postAction(url, {}, {
    headers: getApiHeaders()
  });

  return data.id;
}

async function createCategorizationMessage(postAction, threadId, query) {
  const url = `https://api.openai.com/v1/threads/${threadId}/messages`;

  const {data} = await postAction(url, {
      role: "user",
      content: `What is the category of ${query}`
    }, {
      headers: getApiHeaders()
  });
  
  return data.id;
}

async function createRun(postAction, threadId, assistantId) {
  const url = `https://api.openai.com/v1/threads/${threadId}/runs`;
  
  const {data} = await postAction(url, {
      assistant_id: assistantId
    }, {
      headers: getApiHeaders()
  });

  return data.id;
}

async function pollRunStatus(getAction, threadId, runId) {
  const url = `https://api.openai.com/v1/threads/${threadId}/runs/${runId}`;
  
  const timeout = 20000;    // Timeout is 20 seconds
  const interval = 3000;    // Poll every 3 seconds
  const startTime = Date.now();

  while (Date.now() - startTime < timeout) {
    try {
        const {data} = await getAction(url, {
          headers: getApiHeaders()
        });
        const status = data.status;
        console.log(`Current status: ${status}`);

        if (status === 'completed') {
            return true;
        }
    } catch (error) {
        console.error('Error while polling status:', error);
        // You might want to decide here whether to continue polling or to abort
    }

    // Wait for defined interval before polling again
    await new Promise(resolve => setTimeout(resolve, interval));
  }

  return false;
}

function findAssistantMessage(jsonObject) {
    const dataArray = jsonObject.data;

    for (let entry of dataArray) {
        if (entry.role === "assistant") {
            return entry.content[0].text.value;
        }
    }

    return null;
}


async function getResponseMessage(getAction, threadId) {
  const url = `https://api.openai.com/v1/threads/${threadId}/messages`;
  
  const {data} = await getAction(url, {
    headers: getApiHeaders()
  });

  return findAssistantMessage(data);
}

async function updateProductTags(shopifyPutAction, productId, payloadTags, categoryTags) {
  const url = `/admin/products/${productId}.json`;

  const updatedByTags = payloadTags
                          .split(', ')
                          .filter((tag) => !!tag)
                          .filter(tag => tag.startsWith(PREFIX_UPDATED_BY));
  const finalTags = new Set([TAG_CATEGORIZED, ...updatedByTags, ...categoryTags]);
                         
  const {data} = await shopifyPutAction(url, {
    product: {
      tags: [...finalTags].join(', ')
    }
  });
}


module.exports = async function(payload, actions, context) {
    try {
      console.log('Incoming tags; ', payload.tags);
      if (payload.tags && payload.tags.includes(TAG_CATEGORIZED)) {
        console.log('The product is already categorized by AI.');
        return;
      }

      const threadId = await createThread(actions.http.post);
      console.log('Thread id: ', threadId);
      
      const categorizationMessageId = await createCategorizationMessage(actions.http.post, threadId, payload.title);
      console.log('Categorization message id: ', categorizationMessageId);
    
      const runId = await createRun(actions.http.post, threadId, ASSISTANT_ID);
      console.log('Run id: ', runId);
      
      const runCompleted = await pollRunStatus(actions.http.get, threadId, runId);
      if (!runCompleted) {
        console.log('Run timed out, returning');
        return;
      }  
      console.log('Run completed successfully, continuing');
    
      const responseMessage = await getResponseMessage(actions.http.get, threadId);
      const categorization = JSON.parse(responseMessage);
    
      await updateProductTags(actions.shopify.put, payload.id, payload.tags, categorization.tags);
      console.log('Updated product tags.');
    } catch (err) {
      console.error('Error occurred:', err);
    }
}