Topic modelling with Spacy, Gensim and Textacy

The jupyter notebook 'topic-modelling.ipynb' contains the following sections:

Initialize: Setting up environment and loading data.
Text extraction. Phrase and tokens extraction with Gensim and Spacy.
Topic modelling. Using Textacy's LDA model.
Data processing. Calculating data for visualization and export.
Model evaluation. A collection of visualizations of the resulting topics.
Export data. The data can be used for creating more visualization or import into a graph.

General concept

The emphasis in this notebook is on facilitating an iterative process where you can easily adjust stopwords and number of topics. Furthermore it contains features to re-focus on sub topics and thereby create a hierachy of topics.

Input

'data-in/tb_data.tsv' contains ~2100 scientific articles with the following properties: doi/title/abstract/keywords.

Output

Start by looking at the notebook: "topic-modelling.ipynb". Somewhere down the file you will find the 'visualization' section that gives an overview of the modelling data.

Most of the other files in the output data directory (data-out/) is exported to be used as input in other projects. If you are interested in understanding the modelled topics more in detail you may look at 'tb_main_doc-top.html' output directory which contains a list the 15 most relevant articles for each topic.

Caveat

Topic modelling using LDA is an stochastic algorithm which will produce (slightly) different results even when run on the same data. The exact same results can therefore not be reproduced.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data-in		data-in
data-out		data-out
README.md		README.md
topic-modelling.ipynb		topic-modelling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic modelling with Spacy, Gensim and Textacy

General concept

Input

Output

Caveat

Inspiration

About

Releases

Packages

Languages

cheTesta/topic-model

Folders and files

Latest commit

History

Repository files navigation

Topic modelling with Spacy, Gensim and Textacy

General concept

Input

Output

Caveat

Inspiration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages