vistopics
(Topic Visualization for Visuals) is a Python package for video and image processing, offering features such as:
This package is designed for developers, researchers, and data scientists working on media processing, visualization, or clustering tasks.
Install vistopics
from PyPI (after you publish it) using:
pip install vistopics
If you plan to use FastDup for duplicate frame detection, install with:
pip install vistopics[fastdup]
Alternatively, install it directly from the source:
git clone https://github.com/aysedeniz09/VisTopics
cd VisTopics
pip install .
Note: The repository name on GitHub is VisTopics (capitalized), but the package name and Python import name are lowercase vistopics.
The following Python libraries are required:
Note: To use FastDup-based functionality (limiting_frames
), you must additionally install:
pip install vistopics[fastdup]
Install base dependencies with:
pip install -r requirements.txt
1. Video Scraping
from vistopics import video_download
video_download(
input_df_path="test_data.csv",
output_df_path="cleaned_videos.csv",
output_dir="downloaded_videos",
link_column="Link",
title_column="Page Name"
)
2. Frame Extraction
from vistopics import extract_frames
extract_frames(
videofolder="downloaded_videos",
images_folder="images",
frame_rate=1
)
3. Duplicate Frame Reduction
from vistopics import limiting_frames
limiting_frames(
path="images",
output_file="reduced_frame_list.csv",
ccthreshold=0.8
)
This step requires the optional fastdup
dependency:
pip install vistopics[fastdup]
4. Caption Generation
from vistopics import get_caption
get_caption(
mykey="your-open-ai-api-key",
path_in="images",
captions_file="captions_file.csv",
model="gpt-4o-mini"
)
1. Download Images from URLs
from vistopics import download_images_from_url
download_images_from_url(
input_csv="urls.csv", # must have a 'url' column
output_csv="captions.csv",
image_dir="images"
)
2. Caption Generation
from vistopics import get_caption
get_caption(
mykey="your-open-ai-api-key",
path_in="images",
captions_file="captions_file.csv",
model="gpt-4o-mini"
)
This project is licensed under the MIT License. See the LICENSE file for details.
We welcome contributions! If you’d like to contribute:
git checkout -b feature-name
git commit -m "Add new feature"
git push origin feature-name
vistopics/ # Python package
__init__.py
captioning.py
extract_frames.py
image_download.py
reduce_frame.py
video_scrape.py
paper/ # Paper replication code
python/
study1_videos.py # Video processing for Study 1
study2_images.py # Image processing for Study 2
R/
study1_videos_lda.R # LDA on video frame captions (Study 1)
study1_transcripts_lda.R # LDA on transcripts (Study 1)
study2_images_lda.R # LDA on image captions (Study 2)
LICENSE
README.md
APA citation:
Lokmanoglu, A. D., & Walter, D. (2025, accepted). Topic Modeling of Video and Image Data: A Visual Semantic Unsupervised Approach. Communication Methods and Measures.
BibTeX citation:
@article{lokmanogluwalter2025topic,
title={Topic Modeling of Video and Image Data: A Visual Semantic Unsupervised Approach},
author={Lokmanoglu, Ayse D. and Walter, Dror},
journal={Communication Methods and Measures},
year={2025},
note={Accepted}
}
The paper/
folder contains all code and workflows for replicating the analyses in our studies.
Note: The paper code is a mix of R (for topic modeling, statistical analysis) and Python (for preprocessing and caption generation).
You will need R ≥ 4.2 and see individual script headers for full package requirements.
Preprocessing & Captioning (Python)
paper/python/study1_videos.py
Samples videos, extracts frames, reduces duplicates with FastDup, and generates captions using vistopics
.
Topic Modeling (R)
paper/R/study1_videos_lda.R
Runs LDA on video-level frame captions from the Study 1 dataset.
paper/R/study1_transcripts_lda.R
Preprocessing & Captioning (Python)
paper/python/study2_images.py
Scrapes article pages for images, downloads them, and generates captions using vistopics
.
Topic Modeling (R)
paper/R/study2_images_lda.R
Runs LDA on captions from the news images dataset.
All datasets and additional materials needed to run the LDA analyses are available on OSF:
https://osf.io/vhdaj/
If you have any questions or feedback, feel free to contact:
Ayse Lokmanoglu & Dror Walter
GitHub: https://github.com/aysedeniz09/VisTopics
The vistopics
package incorporates and builds upon the work of the following projects and resources:
We thank the developers and maintainers of these tools for making their work publicly available and for their contributions to the open-source community.