Skip to content

Commit fdf40eb

Browse files
authored
PERF: pyemd to POT for EMD computation in wmdistance (#3327)
* PERF: switch from pyemd to POT for EMD computation * Adapt citations * Adapt dependency * Adapt tests * Update cache for gallery Co-authored-by: TLouf <[email protected]>
1 parent a435f24 commit fdf40eb

32 files changed

+866
-370
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ env:
2626
# them here for now. They'll get picked up by the multibuild stuff
2727
# running in multibuild/common_utils.sh.
2828
#
29-
- TEST_DEPENDS="pytest mock cython nmslib pyemd testfixtures python-levenshtein==0.12.0 visdom==0.1.8.9 scikit-learn"
29+
- TEST_DEPENDS="pytest mock cython nmslib POT testfixtures python-levenshtein==0.12.0 visdom==0.1.8.9 scikit-learn"
3030

3131
matrix:
3232
#

docs/notebooks/WMD_tutorial.ipynb

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"\n",
3131
"## Running this notebook\n",
3232
"\n",
33-
"You can download this [iPython Notebook](http://ipython.org/notebook.html), and run it on your own computer, provided you have installed Gensim, PyEMD, NLTK, and downloaded the necessary data.\n",
33+
"You can download this [iPython Notebook](http://ipython.org/notebook.html), and run it on your own computer, provided you have installed Gensim, POT, NLTK, and downloaded the necessary data.\n",
3434
"\n",
3535
"The notebook was run on an Ubuntu machine with an Intel core i7-4770 CPU 3.40GHz (8 cores) and 32 GB memory. Running the entire notebook on this machine takes about 3 minutes.\n",
3636
"\n",
@@ -524,8 +524,7 @@
524524
"source": [
525525
"## References\n",
526526
"\n",
527-
"1. Ofir Pele and Michael Werman, *A linear time histogram metric for improved SIFT matching*, 2008.\n",
528-
"* Ofir Pele and Michael Werman, *Fast and robust earth mover's distances*, 2009.\n",
527+
"1. * Rémi Flamary et al. *POT: Python Optimal Transport*, 2021.\n",
529528
"* Matt Kusner et al. *From Embeddings To Document Distances*, 2015.\n",
530529
"* Thomas Mikolov et al. *Efficient Estimation of Word Representations in Vector Space*, 2013."
531530
]

docs/notebooks/soft_cosine_tutorial.ipynb

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
">\n",
3131
"\n",
3232
"## Running this notebook\n",
33-
"You can download this [Jupyter notebook](http://jupyter.org/), and run it on your own computer, provided you have installed the `gensim`, `jupyter`, `sklearn`, `pyemd`, and `wmd` Python packages.\n",
33+
"You can download this [Jupyter notebook](http://jupyter.org/), and run it on your own computer, provided you have installed the `gensim`, `jupyter`, `sklearn`, `POT`, and `wmd` Python packages.\n",
3434
"\n",
3535
"The notebook was run on an Ubuntu machine with an Intel core i7-6700HQ CPU 3.10GHz (4 cores) and 16 GB memory. Assuming all resources required by the notebook have already been downloaded, running the entire notebook on this machine takes about 30 minutes."
3636
]
@@ -357,7 +357,7 @@
357357
"metadata": {},
358358
"outputs": [],
359359
"source": [
360-
"!pip install pyemd"
360+
"!pip install POT"
361361
]
362362
},
363363
{
@@ -404,7 +404,7 @@
404404
" return similarities\n",
405405
"\n",
406406
"def wmd_gensim(query, documents):\n",
407-
" # Compute Word Mover's Distance as implemented in PyEMD by William Mayner\n",
407+
" # Compute Word Mover's Distance as implemented in POT\n",
408408
" # between the query and the documents.\n",
409409
" index = WmdSimilarity(documents, w2v_model)\n",
410410
" similarities = index[query]\n",
@@ -532,26 +532,26 @@
532532
"cell_type": "markdown",
533533
"metadata": {},
534534
"source": [
535-
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
536-
":---|:---|:---|---:\n",
537-
"2016-test|softcossim|78.52 ±11.18|6.00 ±0.79\n",
538-
"2016-test|**Winner (UH-PRHLT-primary)**|76.70 ±0.00|\n",
539-
"2016-test|cossim|76.45 ±10.40|0.64 ±0.08\n",
540-
"2016-test|wmd-gensim|76.23 ±11.42|5.37 ±0.64\n",
541-
"2016-test|**Baseline 1 (IR)**|74.75 ±0.00|\n",
542-
"2016-test|wmd-relax|71.05 ±11.06|1.11 ±0.09\n",
543-
"2016-test|**Baseline 2 (random)**|46.98 ±0.00|\n",
544-
"\n",
545-
"\n",
546-
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
547-
":---|:---|:---|---:\n",
548-
"2017-test|**Winner (SimBow-primary)**|47.22 ±0.00|\n",
549-
"2017-test|softcossim|45.88 ±16.22|7.08 ±1.49\n",
550-
"2017-test|cossim|44.38 ±14.71|0.74 ±0.10\n",
551-
"2017-test|wmd-gensim|44.06 ±15.92|6.20 ±0.87\n",
552-
"2017-test|wmd-relax|43.52 ±16.30|1.30 ±0.18\n",
553-
"2017-test|**Baseline 1 (IR)**|41.85 ±0.00|\n",
554-
"2017-test|**Baseline 2 (random)**|29.81 ±0.00|"
535+
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
536+
":---|:---|:---|---:\n",
537+
"2016-test|softcossim|78.52 ±11.18|6.00 ±0.79\n",
538+
"2016-test|**Winner (UH-PRHLT-primary)**|76.70 ±0.00|\n",
539+
"2016-test|cossim|76.45 ±10.40|0.64 ±0.08\n",
540+
"2016-test|wmd-gensim|76.23 ±11.42|5.37 ±0.64\n",
541+
"2016-test|**Baseline 1 (IR)**|74.75 ±0.00|\n",
542+
"2016-test|wmd-relax|71.05 ±11.06|1.11 ±0.09\n",
543+
"2016-test|**Baseline 2 (random)**|46.98 ±0.00|\n",
544+
"\n",
545+
"\n",
546+
"Dataset | Strategy | MAP score | Elapsed time (sec)\n",
547+
":---|:---|:---|---:\n",
548+
"2017-test|**Winner (SimBow-primary)**|47.22 ±0.00|\n",
549+
"2017-test|softcossim|45.88 ±16.22|7.08 ±1.49\n",
550+
"2017-test|cossim|44.38 ±14.71|0.74 ±0.10\n",
551+
"2017-test|wmd-gensim|44.06 ±15.92|6.20 ±0.87\n",
552+
"2017-test|wmd-relax|43.52 ±16.30|1.30 ±0.18\n",
553+
"2017-test|**Baseline 1 (IR)**|41.85 ±0.00|\n",
554+
"2017-test|**Baseline 2 (random)**|29.81 ±0.00|"
555555
]
556556
},
557557
{

docs/src/auto_examples/core/index.rst

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
2+
3+
.. _sphx_glr_auto_examples_core:
4+
5+
Core Tutorials: New Users Start Here!
6+
-------------------------------------
7+
8+
If you're new to gensim, we recommend going through all core tutorials in order.
9+
Understanding this functionality is vital for using gensim effectively.
10+
11+
12+
13+
.. raw:: html
14+
15+
<div class="sphx-glr-thumbnails">
16+
17+
18+
.. raw:: html
19+
20+
<div class="sphx-glr-thumbcontainer" tooltip="This tutorial introduces Documents, Corpora, Vectors and Models: the basic concepts and terms n...">
21+
22+
.. only:: html
23+
24+
.. image:: /auto_examples/core/images/thumb/sphx_glr_run_core_concepts_thumb.png
25+
:alt: Core Concepts
26+
27+
:ref:`sphx_glr_auto_examples_core_run_core_concepts.py`
28+
29+
.. raw:: html
30+
31+
<div class="sphx-glr-thumbnail-title">Core Concepts</div>
32+
</div>
33+
34+
35+
.. raw:: html
36+
37+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates transforming text into a vector space representation.">
38+
39+
.. only:: html
40+
41+
.. image:: /auto_examples/core/images/thumb/sphx_glr_run_corpora_and_vector_spaces_thumb.png
42+
:alt: Corpora and Vector Spaces
43+
44+
:ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py`
45+
46+
.. raw:: html
47+
48+
<div class="sphx-glr-thumbnail-title">Corpora and Vector Spaces</div>
49+
</div>
50+
51+
52+
.. raw:: html
53+
54+
<div class="sphx-glr-thumbcontainer" tooltip="Introduces transformations and demonstrates their use on a toy corpus.">
55+
56+
.. only:: html
57+
58+
.. image:: /auto_examples/core/images/thumb/sphx_glr_run_topics_and_transformations_thumb.png
59+
:alt: Topics and Transformations
60+
61+
:ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py`
62+
63+
.. raw:: html
64+
65+
<div class="sphx-glr-thumbnail-title">Topics and Transformations</div>
66+
</div>
67+
68+
69+
.. raw:: html
70+
71+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates querying a corpus for similar documents.">
72+
73+
.. only:: html
74+
75+
.. image:: /auto_examples/core/images/thumb/sphx_glr_run_similarity_queries_thumb.png
76+
:alt: Similarity Queries
77+
78+
:ref:`sphx_glr_auto_examples_core_run_similarity_queries.py`
79+
80+
.. raw:: html
81+
82+
<div class="sphx-glr-thumbnail-title">Similarity Queries</div>
83+
</div>
84+
85+
86+
.. raw:: html
87+
88+
</div>
89+
90+
91+
.. toctree::
92+
:hidden:
93+
94+
/auto_examples/core/run_core_concepts
95+
/auto_examples/core/run_corpora_and_vector_spaces
96+
/auto_examples/core/run_topics_and_transformations
97+
/auto_examples/core/run_similarity_queries
98+
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
2+
3+
.. _sphx_glr_auto_examples_howtos:
4+
5+
How-to Guides: Solve a Problem
6+
------------------------------
7+
8+
These **goal-oriented guides** demonstrate how to **solve a specific problem** using gensim.
9+
10+
11+
12+
.. raw:: html
13+
14+
<div class="sphx-glr-thumbnails">
15+
16+
17+
.. raw:: html
18+
19+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates simple and quick access to common corpora and pretrained models.">
20+
21+
.. only:: html
22+
23+
.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_downloader_api_thumb.png
24+
:alt: How to download pre-trained models and corpora
25+
26+
:ref:`sphx_glr_auto_examples_howtos_run_downloader_api.py`
27+
28+
.. raw:: html
29+
30+
<div class="sphx-glr-thumbnail-title">How to download pre-trained models and corpora</div>
31+
</div>
32+
33+
34+
.. raw:: html
35+
36+
<div class="sphx-glr-thumbcontainer" tooltip="How to author documentation for Gensim.">
37+
38+
.. only:: html
39+
40+
.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_doc_thumb.png
41+
:alt: How to Author Gensim Documentation
42+
43+
:ref:`sphx_glr_auto_examples_howtos_run_doc.py`
44+
45+
.. raw:: html
46+
47+
<div class="sphx-glr-thumbnail-title">How to Author Gensim Documentation</div>
48+
</div>
49+
50+
51+
.. raw:: html
52+
53+
<div class="sphx-glr-thumbcontainer" tooltip="Shows how to reproduce results of the &quot;Distributed Representation of Sentences and Documents&quot; p...">
54+
55+
.. only:: html
56+
57+
.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_doc2vec_imdb_thumb.png
58+
:alt: How to reproduce the doc2vec 'Paragraph Vector' paper
59+
60+
:ref:`sphx_glr_auto_examples_howtos_run_doc2vec_imdb.py`
61+
62+
.. raw:: html
63+
64+
<div class="sphx-glr-thumbnail-title">How to reproduce the doc2vec 'Paragraph Vector' paper</div>
65+
</div>
66+
67+
68+
.. raw:: html
69+
70+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates how you can visualize and compare trained topic models.">
71+
72+
.. only:: html
73+
74+
.. image:: /auto_examples/howtos/images/thumb/sphx_glr_run_compare_lda_thumb.png
75+
:alt: How to Compare LDA Models
76+
77+
:ref:`sphx_glr_auto_examples_howtos_run_compare_lda.py`
78+
79+
.. raw:: html
80+
81+
<div class="sphx-glr-thumbnail-title">How to Compare LDA Models</div>
82+
</div>
83+
84+
85+
.. raw:: html
86+
87+
</div>
88+
89+
90+
.. toctree::
91+
:hidden:
92+
93+
/auto_examples/howtos/run_downloader_api
94+
/auto_examples/howtos/run_doc
95+
/auto_examples/howtos/run_doc2vec_imdb
96+
/auto_examples/howtos/run_compare_lda
97+

docs/src/auto_examples/index.rst

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -152,35 +152,35 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
152152

153153
.. raw:: html
154154

155-
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">
155+
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus.">
156156

157157
.. only:: html
158158

159-
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
160-
:alt: Ensemble LDA
159+
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
160+
:alt: FastText Model
161161

162-
:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`
162+
:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`
163163

164164
.. raw:: html
165165

166-
<div class="sphx-glr-thumbnail-title">Ensemble LDA</div>
166+
<div class="sphx-glr-thumbnail-title">FastText Model</div>
167167
</div>
168168

169169

170170
.. raw:: html
171171

172-
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus.">
172+
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">
173173

174174
.. only:: html
175175

176-
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
177-
:alt: FastText Model
176+
.. image:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
177+
:alt: Ensemble LDA
178178

179-
:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`
179+
:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`
180180

181181
.. raw:: html
182182

183-
<div class="sphx-glr-thumbnail-title">FastText Model</div>
183+
<div class="sphx-glr-thumbnail-title">Ensemble LDA</div>
184184
</div>
185185

186186

@@ -220,7 +220,7 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
220220

221221
.. raw:: html
222222

223-
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the SCM.">
223+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the WMD.">
224224

225225
.. only:: html
226226

@@ -237,7 +237,7 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
237237

238238
.. raw:: html
239239

240-
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the WMD.">
240+
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates using Gensim&#x27;s implemenation of the SCM.">
241241

242242
.. only:: html
243243

0 commit comments

Comments
 (0)