Skip to content

Commit

Permalink
v1.4.5 deal with invalid zero vectors in queries for consine similarity.
Browse files Browse the repository at this point in the history
  • Loading branch information
masajiro committed Oct 20, 2018
1 parent 355754a commit 79ba11c
Show file tree
Hide file tree
Showing 10 changed files with 62 additions and 26 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ project(ngt)

set(ngt_VERSION_MAJOR 1)
set(ngt_VERSION_MINOR 4)
set(ngt_VERSION_PATCH 4)
set(ngt_VERSION_PATCH 5)

set(ngt_VERSION ${ngt_VERSION_MAJOR}.${ngt_VERSION_MINOR}.${ngt_VERSION_PATCH})
set(ngt_SOVERSION ${ngt_VERSION_MAJOR})
Expand Down
15 changes: 11 additions & 4 deletions README-jp.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ Neighborhood Graph and Tree for Indexing High-dimensional Data

大量(数百万から数千万データ)の高次元ベクトルデータ(数十~数千次元)に対して高速な近似近傍検索を可能とするコマンド及びライブラリを提供します。

ニュース
-------

- [ONNG](README-jp.md#onng)が利用可能になりました。(2018/08/08 : v1.4.0)

インストール
-----------

Expand Down Expand Up @@ -65,16 +70,18 @@ Neighborhood Graph and Tree for Indexing High-dimensional Data

関連文献
--------
##### [ONNG](bin/ngt/README-jp.md#onng)
- Iwasaki, M., Miyazaki, D.: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity. arXiv:1810.07355 [cs] (2018). ([pdf](https://arxiv.org/abs/1810.07355))

##### PANNG
- Iwasaki, M.: Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search. Proc. of SISAP2016 (2016) 20-33.
##### [PANNG](bin/ngt/README-jp.md#panng)
- Iwasaki, M.: Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search. Proc. of SISAP2016 (2016) 20-33. ([pdf](https://link.springer.com/chapter/10.1007/978-3-319-46759-7_2))
- Sugawara, K., Kobayashi, H. and Iwasaki, M.: On Approximately Searching for Similar Word Embeddings. Proc. of ACL2016 (2016) 2265-2275. ([pdf](https://aclweb.org/anthology/P/P16/P16-1214.pdf))

##### ANNGT
##### [ANNGT](bin/ngt/README-jp.md#anngt)
- Iwasaki, M.: Applying a Graph-Structured Index to Product Image Search (in Japanese). IIEEJ Journal 42(5) (2013) 633-641. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-iieej-jnl-2013.pdf))
- Iwasaki, M.: Proximity search using approximate k nearest neighbor graph with a tree structured index (in Japanese). IPSJ Journal 52(2) (2011) 817-828. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-ipsj-jnl-2011.pdf))

##### ANNG
##### [ANNG](bin/ngt/README-jp.md#anng)
- Iwasaki, M.: Proximity search in metric spaces using approximate k nearest neigh-bor graph (in Japanese). IPSJ Trans. on Database 3(1) (2010) 18-28. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-ipsj-tod-2010.pdf))

Copyright © 2015-2018 Yahoo Japan Corporation All Rights Reserved.
Expand Down
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ Neighborhood Graph and Tree for Indexing High-dimensional Data

**NGT** provides commands and a library for performing high-speed approximate nearest neighbor searches against a large volume of data (several million to several 10 million items of data) in high dimensional vector data space (several ten to several thousand dimensions).

News
----

- [ONNG](README.md#onng) is now available. (8/8/2018 : v1.4.0)

Installation
------------

Expand Down Expand Up @@ -73,16 +78,18 @@ Note that only for contributions to the NGT repository on the GitHub (https://gi

Publications
------------
##### [ONNG](bin/ngt/README.md#onng)
- Iwasaki, M., Miyazaki, D.: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity. arXiv:1810.07355 [cs] (2018). ([pdf](https://arxiv.org/abs/1810.07355))

##### PANNG
- Iwasaki, M.: Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search. Proc. of SISAP2016 (2016) 20-33.
##### [PANNG](bin/ngt/README.md#panng)
- Iwasaki, M.: Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search. Proc. of SISAP2016 (2016) 20-33. ([pdf](https://link.springer.com/chapter/10.1007/978-3-319-46759-7_2))
- Sugawara, K., Kobayashi, H. and Iwasaki, M.: On Approximately Searching for Similar Word Embeddings. Proc. of ACL2016 (2016) 2265-2275. ([pdf](https://aclweb.org/anthology/P/P16/P16-1214.pdf))

##### ANNGT
##### [ANNGT](bin/ngt/README.md#anngt)
- Iwasaki, M.: Applying a Graph-Structured Index to Product Image Search (in Japanese). IIEEJ Journal 42(5) (2013) 633-641. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-iieej-jnl-2013.pdf))
- Iwasaki, M.: Proximity search using approximate k nearest neighbor graph with a tree structured index (in Japanese). IPSJ Journal 52(2) (2011) 817-828. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-ipsj-jnl-2011.pdf))

##### ANNG
##### [ANNG](bin/ngt/README.md#anng)
- Iwasaki, M.: Proximity search in metric spaces using approximate k nearest neigh-bor graph (in Japanese). IPSJ Trans. on Database 3(1) (2010) 18-28. ([pdf](https://s.yimg.jp/i/docs/research_lab/articles/miwasaki-ipsj-tod-2010.pdf))

Copyright © 2015-2018 Yahoo Japan Corporation All Rights Reserved.
Expand Down
2 changes: 1 addition & 1 deletion bin/ngt/README-jp.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ no\_of\_forcedly\_pruned\_edgesはno\_of\_selectively\_pruned\_edgesより大き



#### ONNG(文献なし)
#### [ONNG](/README.md#onng)
```
$ ngt create -i t -g a -S 0 -e 0.0 -E no_of_edges -d dimensionality_of_data -o data_type -D distatnce_type anng-index vector-data.dat
$ ngt reconstruct-graph -m S -o outdegree -i indegree anng-index onng-index
Expand Down
2 changes: 1 addition & 1 deletion bin/ngt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ Perform a neighborhood search by three queries specified in a file:



#### ONNG (not yet published)
#### [ONNG](/README.md#onng)
```
$ ngt create -i t -g a -S 0 -e 0.0 -E no_of_edges -d dimensionality_of_data -o data_type -D distatnce_type anng-index vector-data.dat
$ ngt reconstruct-graph -m S -o outdegree -i indegree anng-index onng-index
Expand Down
4 changes: 2 additions & 2 deletions lib/NGT/Common.h
Original file line number Diff line number Diff line change
Expand Up @@ -435,7 +435,7 @@ namespace NGT {
ifstream st(f);
if (!st) {
stringstream msg;
msg << "PropertSet::load: Cannot load the property file " << f << ".";
msg << "PropertySet::load: Cannot load the property file " << f << ".";
NGTThrowException(msg);
}
load(st);
Expand All @@ -444,7 +444,7 @@ namespace NGT {
ofstream st(f);
if (!st) {
stringstream msg;
msg << "PropertSet::save: Cannot save. " << f << endl;
msg << "PropertySet::save: Cannot save. " << f << endl;
NGTThrowException(msg);
}
save(st);
Expand Down
19 changes: 17 additions & 2 deletions lib/NGT/ObjectRepository.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,14 @@ namespace NGT {
vector<double> object;
try {
extractObjectFromText(line, "\t ", object);
push_back((PersistentObject*)allocateNormalizedPersistentObject(object));
PersistentObject *obj = 0;
try {
obj = allocateNormalizedPersistentObject(object);
} catch (Exception &err) {
cerr << err.what() << " continue..." << endl;
obj = allocatePersistentObject(object);
}
push_back(obj);
} catch (Exception &err) {
std::cerr << "ObjectSpace::readText: Warning! Invalid line. [" << line << "] Skip the line " << lineNo << " and continue." << std::endl;
}
Expand All @@ -152,7 +159,15 @@ namespace NGT {
object.push_back(data[dataidx]);
}
try {
push_back((PersistentObject*)allocateNormalizedPersistentObject(object));
PersistentObject *obj = 0;
try {
obj = allocateNormalizedPersistentObject(object);
} catch (Exception &err) {
cerr << err.what() << " continue..." << endl;
obj = allocatePersistentObject(object);
}
push_back(obj);

} catch (Exception &err) {
std::cerr << "ObjectSpace::readText: Warning! Invalid data. Skip the data no. " << idx << " and continue." << std::endl;
}
Expand Down
5 changes: 3 additions & 2 deletions lib/NGT/ObjectSpace.h
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,9 @@ namespace NGT {
sum += (double)data[i] * (double)data[i];
}
if (sum == 0.0) {
cerr << "normalize: Warning! the object is a zero vector for the cosine similarity or angle distance." << endl;
return;
stringstream msg;
msg << "ObjectSpace::normalize: Error! the object is an invalid zero vector for the cosine similarity or angle distance.";
NGTThrowException(msg);
}
sum = sqrt(sum);
for (size_t i = 0; i < dim; i++) {
Expand Down
12 changes: 4 additions & 8 deletions lib/NGT/PrimitiveComparator.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,14 +101,13 @@ namespace NGT {
__m128 v2;
__m128 sum2 = _mm_add_ps(_mm256_extractf128_ps(sum, 0), _mm256_extractf128_ps(sum, 1));

lastgroup = last - 3;

while (a < lastgroup) {
while (a < last) {
v2 = _mm_sub_ps(_mm_loadu_ps(a), _mm_loadu_ps(b));
sum2 = _mm_add_ps(sum2, _mm_mul_ps(v2, v2));
a += 4;
b += 4;
}

float f[4];
_mm_store_ps(f, sum2);

Expand Down Expand Up @@ -300,9 +299,7 @@ namespace NGT {
__m128 am2, bm2;
__m128 sum2 = _mm_add_ps(_mm256_extractf128_ps(sum, 0), _mm256_extractf128_ps(sum, 1));

lastgroup = last - 3;

while (a < lastgroup) {
while (a < last) {
am2 = _mm_loadu_ps(a);
bm2 = _mm_loadu_ps(b);
sum2 = _mm_add_ps(sum2, _mm_mul_ps(am2, bm2));
Expand Down Expand Up @@ -349,7 +346,6 @@ namespace NGT {
double nb = f[0] + f[1] + f[2] + f[3] + f[4] + f[5] + f[6] + f[7];
_mm256_store_ps(f, sum);
double s = f[0] + f[1] + f[2] + f[3] + f[4] + f[5] + f[6] + f[7];

while (a < last) {
double av = *a;
double bv = *b;
Expand Down Expand Up @@ -382,7 +378,7 @@ namespace NGT {

template <typename OBJECT_TYPE>
inline static double compareAngleDistance(const OBJECT_TYPE *a, const OBJECT_TYPE *b, size_t size) {
double cosine = compareAngleDistance(a, b, size);
double cosine = compareCosine(a, b, size);
if (cosine >= 1.0F) {
return 0.0F;
} else if (cosine <= -1.0F) {
Expand Down
12 changes: 11 additions & 1 deletion python/src/ngtpy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,17 @@ class Index : public NGT::Index {
) {
py::array_t<float> qobject(query);
py::buffer_info qinfo = qobject.request();
NGT::Object *ngtquery = NGT::Index::allocateObject(static_cast<float*>(qinfo.ptr), qinfo.size);
NGT::Object *ngtquery = 0;
try {
ngtquery = NGT::Index::allocateObject(static_cast<float*>(qinfo.ptr), qinfo.size);
} catch (NGT::Exception &e) {
std::cerr << e.what() << endl;
if (!withDistance) {
return py::array_t<int>();
} else {
return py::list();
}
}
NGT::SearchContainer sc(*ngtquery);
sc.setSize(size); // the number of resultant objects.
sc.setEpsilon(epsilon); // set exploration coefficient.
Expand Down

0 comments on commit 79ba11c

Please sign in to comment.