Topic Modeling with LDA (scikit-learn)
- Category: Natural Language Processing
- Tools Used: NumPy, Pandas, NLTK, SpaCy, Matplotlib, Seaborn, WordCloud, PyLDAvis, scikit-learn, CountVectorizer, TfidfVectorizer, joblib,
- Project URL: https://github.com/Shubhkirti24/NLP/blob/main/ Topic%20Modeling_LDA.ipynb
Project Summary
This project explores topic modeling using Latent Dirichlet Allocation (LDA) to analyze a dataset of BBC news articles. The goal is to identify underlying topics within the articles and understand how these topics are distributed across the dataset. The project leverages a dataset of 2,225 news articles categorized into various topics such as business, sports, politics, technology, and entertainment. The project involves data preprocessing, topic modeling, visualization of topics, and evaluation of the model’s performance.