///
The "Natural Language Categorization & Search" feature in Infinite Image Browsing (IIB) provides advanced capabilities for organizing and searching your image library based on the semantic content of
39 views
~39 views from guests
Guest views are estimated from total page views. These include anonymous visitors and users who weren't logged in when they viewed the page.
The "Natural Language Categorization & Search" feature in Infinite Image Browsing (IIB) provides advanced capabilities for organizing and searching your image library based on the semantic content of their generation prompts. It leverages OpenAI-compatible large language models (LLMs) and embedding models to understand, categorize, and retrieve images in a more intuitive, natural language-driven way. This feature is currently experimental.
This feature allows you to:
The core functionality of these AI features involves several steps:
Prompt Extraction & Normalization:
.txt files to extract the raw generation prompt.Embeddings:
image_embedding SQLite table.Clustering:
Title Generation (LLM):
topic_title_cache SQLite table.Retrieval (RAG-like Search):
To optimize performance and minimize API calls/costs, IIB employs a robust caching strategy:
Embedding Cache (image_embedding table):
image_embedding table, keyed by image_id.model was used, the text_hash (derived from the normalized prompt and normalization version) is identical, and an existing vector is present. This prevents re-embedding unchanged prompts.text_hash = sha256(f"{normalize_version}:{prompt_text}"). The normalize_version is a code-derived fingerprint of the normalization rules, ensuring cache invalidation when the rules change.force=true to build_iib_output_embeddings or force_embed=true to cluster_iib_output_job_start.Topic Title Cache (topic_title_cache table):
topic_title_cache table, keyed by cluster_hash.use_title_cache=true (default) and force_title=false, titles and keywords are reused from the cache.cluster_hash): Includes the member image_ids (sorted), embedding model, clustering threshold, min_cluster_size, title generation model, output lang, and the prompt normalization fingerprint (normalize_version) and mode. This ensures titles are only reused if all relevant parameters are unchanged.force_title=true.Topic Cluster Cache (topic_cluster_cache table):
embeddings_count and embeddings_max_updated_at metadata) and clustering parameters are identical.These settings are crucial for enabling and fine-tuning the AI-powered features. They must be set as environment variables (e.g., in a .env file in the application's root directory). For more details, refer to the [Configuration Guide].
API Credentials and Endpoints:
OPENAI_API_KEY: Your API key for accessing OpenAI-compatible services.OPENAI_BASE_URL: The base URL for your OpenAI-compatible API endpoint (e.g., https://api.openai.com/v1).AI Models:
AI_MODEL: The default chat model (fallback for TOPIC_TITLE_MODEL).EMBEDDING_MODEL: The model for generating vector embeddings.TOPIC_TITLE_MODEL: The specific chat model for generating cluster titles and keywords.Prompt Normalization:
IIB_PROMPT_NORMALIZE: Enables or disables cleaning of prompts before embedding (1 for enable, 0 for disable).IIB_PROMPT_NORMALIZE_MODE: Sets the aggressiveness of prompt cleaning (balanced or theme_only).Tag Graph Generation:
IIB_TAG_GRAPH_MAX_TAGS_FOR_LLM: Maximum tags sent to LLM for abstraction.IIB_TAG_GRAPH_TOPK_TAGS_FOR_LLM: Top K tags by frequency/weight for LLM input.IIB_TAG_GRAPH_LLM_TIMEOUT_SEC: Timeout for LLM requests during graph generation.IIB_TAG_GRAPH_LLM_MAX_ATTEMPTS: Max retry attempts for LLM calls.The Tag Graph provides a hierarchical visualization of tag relationships derived from the clustering results, offering an LLM-driven abstraction of your image themes.
The graph is structured in multiple layers, similar to a neural network:
LayerNode: Represents an entity in the graph (a cluster, a tag, or an abstract concept). It has an id, a label, and a size (representing its importance or image count).GraphLayer: Defines a level in the hierarchy, with a level number (0, 1, 2+), a name (e.g., "Clusters", "Tags", "Abstract-1"), and a list of LayerNodes.GraphLink: Represents a connection between two nodes (source and target) with an associated weight.POST /infinite_image_browsing/db/cluster_tag_graph:
folder_paths (list of strings, required), lang (optional string, for LLM output language).layers, links, and stats.POST /infinite_image_browsing/db/cluster_tag_graph_cluster_paths:
topic_cluster_cache_key and cluster_id. This allows fetching cluster members on demand without embedding large path lists directly in the main graph response.topic_cluster_cache_key (string, required), cluster_id (string, required).paths (strings).These features provide powerful new ways to explore and manage your generated images, moving beyond simple keyword search to semantic understanding.