Saturday, February 15, 2025

How to Building a Simple SEO Application with Deepseek R1


In this article, we'll explore a practical application of DeepSeek R1, demonstrating how this powerful local LLM can be used in a truly useful way, potentially changing how you think about SEO automation.

What are Local LLMs and Why DeepSeek R1?

Local LLMs are AI models that, unlike cloud-based services, run directly on your computer. This offers several advantages, including enhanced privacy, offline accessibility, and potentially lower costs in the long run. DeepSeek R1 is a leading open-source LLM known for its strong performance and efficiency, making it an excellent choice for local deployment.

To get started with DeepSeek R1, we'll be using Ollama, a tool that simplifies the process of running and managing LLMs on your local system. Think of Ollama as the engine that powers DeepSeek R1 on your machine.

Beyond Chatbots: Real-World SEO Applications

While interacting with local LLMs through simple text prompts can be interesting, their true potential lies in their ability to be integrated into automated workflows. Imagine creating custom SEO tools that run directly on your computer, tailored to your specific needs and without relying on external APIs.

This is where DeepSeek R1 shines. We can leverage its language processing capabilities to build scripts and workflows that automate various SEO tasks.

Building a Simple SEO Workflow: Web Scraping for Product Insights

Let's walk through a practical example: creating a workflow that scrapes product information from websites and extracts key details in a structured JSON format. This is incredibly useful for e-commerce businesses, market research, and content creation.

Here's the workflow we'll build:

  1. Sitemap Input: The workflow starts with a sitemap URL of a target website.
  2. URL Extraction: It extracts all URLs from the sitemap, intelligently handling different sitemap formats.
  3. Keyword-Based Filtering: Using DeepSeek R1, the workflow filters these URLs to identify those most relevant to a specific keyword (e.g., "best sneakers for men 2025").
  4. Webpage Scraping with Gina: The relevant URLs are then passed to Gina, a tool that excels at converting webpages into LLM-friendly text, extracting content, and even images. We'll configure Gina to exclude unnecessary header and footer content for cleaner results.
  5. JSON Output with DeepSeek R1: Finally, DeepSeek R1 is used again to process the scraped webpage content and extract key product information like pricing, descriptions, images, features, and specifications, outputting this data in a structured JSON format.

Step-by-Step Guide to Replicate the Workflow:

  1. Install Ollama:

    • Visit Ollama's website and download the version for your operating system. Follow the installation instructions provided.
  2. Pull the DeepSeek R1 Model:

    • Open your command prompt or terminal.
    • Run the command: ollama run deepseek-r1
    • Select the model size appropriate for your system (8B, 14B, 32B, or 70B). For most systems, the 8B model is a good starting point. Ollama will automatically download and install the model.
  3. Create a Python Script:

    • Open a code editor like Visual Studio Code.
    • Create a new folder and a new Python file (e.g., seo_workflow.py).
    • Paste the following Python code into your file. This script is adapted from the example generated using Claude, focusing on clarity and SEO relevance:

    <!-- end list -->

    Python
    import json
    import requests
    from bs4 import BeautifulSoup
    from ollama import Ollama
    
    # Gina API Configuration (Replace with your actual API key)
    GINA_API_URL = "https://api.everypixel.com/v1/gina"
    GINA_API_KEY = "YOUR_GINA_API_KEY" # Replace with your API key
    
    # Ollama Client Setup
    client = Ollama(model="deepseek-r1") # Or your chosen DeepSeek R1 model
    
    def get_sitemap_urls(sitemap_url):
        """Fetches URLs from a sitemap XML."""
        urls = []
        try:
            response = requests.get(sitemap_url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'xml')
            url_tags = soup.find_all('loc')
            urls = [tag.string for tag in url_tags if tag.string]
        except requests.exceptions.RequestException as e:
            print(f"Error fetching sitemap: {e}")
        except Exception as e:
            print(f"Error parsing sitemap: {e}")
        return urls
    
    def filter_relevant_urls_with_llm(urls, keyword):
        """Filters URLs using DeepSeek R1 for relevance to a keyword."""
        relevant_urls = []
        prompt = f"Given the following URLs, identify the ones that are most relevant to the keyword: '{keyword}'. Please provide only the relevant URLs, one URL per line.\n\n" + "\n".join(urls)
        response = client.chat(
            messages=[
                {
                    'role': 'user',
                    'content': prompt,
                },
            ],
        )
        # Basic parsing - improve this for robust URL extraction
        for line in response['message']['content'].strip().splitlines():
            if line.strip(): # Check for non-empty lines
                relevant_urls.append(line.strip())
        return relevant_urls
    
    
    def scrape_page_with_gina(url):
        """Scrapes webpage content using Gina API."""
        headers = {
            'Authorization': f'Bearer {GINA_API_KEY}'
        }
        params = {
            'url': url,
            'exclude_selector': ['header', 'footer'] # Exclude header and footer
        }
        try:
            response = requests.get(GINA_API_URL, headers=headers, params=params)
            response.raise_for_status()
            gina_response = response.json()
            if gina_response and gina_response['status'] == 'ok':
                return gina_response['markdown'] # Or 'text' for plain text
            else:
                print(f"Gina API error for {url}: {gina_response.get('status_text', 'Unknown error')}")
                return None
        except requests.exceptions.RequestException as e:
            print(f"Error scraping {url} with Gina: {e}")
            return None
        except json.JSONDecodeError:
            print(f"Error decoding JSON response from Gina for {url}")
            return None
    
    def extract_product_info_with_llm(page_content):
        """Extracts product information in JSON format using DeepSeek R1."""
        if not page_content:
            return None
    
        prompt = f"Analyze the following webpage content and extract product information in JSON format. Include fields for 'pricing', 'description', 'images', 'features', and 'specifications'. Extract any other relevant product details you can find. If information is not available, mark as 'null'.\n\nContent:\n{page_content}"
    
        response = client.chat(
            messages=[
                {
                    'role': 'user',
                    'content': prompt,
                },
            ],
             format='json' # Request JSON format
        )
        try:
            # Attempt to parse JSON, handle potential errors
            return json.loads(response['message']['content'])
        except json.JSONDecodeError:
            print("Error decoding JSON from DeepSeek R1. Raw response content:")
            print(response['message']['content']) # Print raw content for debugging
            return None
    
    
    if __name__ == "__main__":
        sitemap_url_input = input("Enter sitemap URL: ")
        keyword_input = input("Enter keyword for URL filtering: ")
    
        urls = get_sitemap_urls(sitemap_url_input)
        if not urls:
            print("No URLs found in sitemap or error fetching sitemap.")
        else:
            print(f"Fetching {len(urls)} URLs from sitemap...")
            relevant_urls = filter_relevant_urls_with_llm(urls, keyword_input)
            print(f"Found {len(relevant_urls)} relevant URLs after filtering with DeepSeek R1.")
    
            for url in relevant_urls:
                print(f"\nScraping and processing: {url}")
                page_content = scrape_page_with_gina(url)
                if page_content:
                    product_info_json = extract_product_info_with_llm(page_content)
                    if product_info_json:
                        print(json.dumps(product_info_json, indent=4))
                    else:
                        print(f"Failed to extract JSON data from {url}")
                else:
                    print(f"Failed to scrape content from {url}")
    
    • Install Dependencies: Open your terminal in the script's folder and run:
      Bash
      pip install ollama beautifulsoup4 requests
      
    • Get a Gina API Key: Sign up for a free trial at Everypixel Gina API to obtain your API key and replace "YOUR_GINA_API_KEY" in the script.
  4. Run the Script:

    • In your terminal, run the script: python seo_workflow.py
    • Enter the sitemap URL when prompted (e.g., https://www.example.com/sitemap.xml).
    • Enter your keyword (e.g., best men's sneakers 2025).

Limitations and Future Enhancements

As demonstrated in the original video, while DeepSeek R1 successfully executes the workflow, its intelligence in accurately identifying relevant products based on keywords might be less refined compared to models like Anthropic's Claude.

Potential improvements include:

  • Prompt Engineering: Experiment with more specific and detailed prompts for DeepSeek R1 to enhance its filtering and data extraction accuracy. For example, instructing it to prioritize URLs with specific keywords in the slug.
  • Model Selection: Consider testing larger DeepSeek R1 models (14B, 70B) or other local LLMs that might offer better performance for this specific task, if your system resources allow.
  • Batch Size Optimization: Adjusting batch sizes in the script might improve performance and accuracy.
  • Refined URL Filtering: Implement more robust URL parsing and filtering logic beyond keyword matching.

Expanding SEO Automation with Local LLMs

This simple product scraper is just the tip of the iceberg. Local LLMs like DeepSeek R1 can be leveraged for a wide range of SEO automation tasks, such as:

  • Automated Content Generation: Generating blog posts, articles, and product descriptions.
  • Keyword Research and Analysis: Identifying relevant keywords and analyzing search trends.
  • SEO Audits: Analyzing website content for SEO optimization opportunities.
  • Competitor Analysis: Scraping and analyzing competitor websites for SEO insights.

Conclusion

Local LLMs are democratizing access to powerful AI tools, opening up exciting new possibilities for SEO professionals and businesses. While still evolving, models like DeepSeek R1 offer a compelling platform for building custom, privacy-focused, and cost-effective SEO automation workflows.

Experiment with the workflow outlined in this article, explore different prompts and models, and unlock the potential of local LLMs to transform your SEO strategies. Try it yourself and share your experiences in the comments below!

0 comments:

Post a Comment