Issue
I am using a matrix multiplication method to retrieve the position of True and False into an array; this is necessary because I cannot use a for look (I have thousands of records).
The procedure is the following:
import numpy as np
# Create a test array
test_array = np.array([[False, True, False, False, False, True]])
# Create a set of unique "tens", each one identifying a position
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
# Multiply the matrix
print(int(np.dot(test_array, uniq_tens)[0]))
100010
The 10010 must be read from right to left (0=False, 1=True, 0=False, 0=False, 1=True). Everything works fine except if the test_array is of 20 elements.
# This works fine - Test with 21 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
21
111000000000000000010
# This works fine - Test with 19 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
19
1000000000000000010
# This does not work - Test with 20 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
20
10000000000000000000
I tested with numpy version 1.16.4/1.19.4 and 1.19.5. Could you please help me in understanding why? I am worried it can happen also with other numbers, not only 20.
Thanks a lot for your help!
Solution
Explanation
You are hitting int64 limit:
print(len(str(2 ** (64 - 1))))
# 19
when computing uniq_tens
, which causes some data type problems in conjuction with np.dot()
with mixed data type inputs.
More precisely, what happens here is that:
uniq_tens
content is Python’sint
, which is arbitrary precision- when you call
np.dot()
theuniq_tens
list is converted to a NumPy array, with unspecified data type- when the maximum value is up until
np.iinfo(np.int64).max
the datatype is inferred to beint64
- when the maximum value is up between
np.iinfo(np.int64).max
andnp.iinfo(np.uint64).max
the datatype is inferred to beuint64
- when the maximum value is above that it retains the Python object and falls back to arbitrary precision
- when the maximum value is up until
- There might be an extra cast in
np.dot()
if the inputs are of mixed dtype. In the case ofnp.bool_
andnp.uint64
the inferred common type isnp.float64
.
Now:
max_int64 = np.iinfo(np.int64).max
print(max_int64, len(str(max_int64)))
# 9223372036854775807 19
max_uint64 = np.iinfo(np.uint64).max
print(max_uint64, len(str(max_uint64)))
# 18446744073709551615 20
print(repr(np.array([max_int64])))
# array([9223372036854775807])
print(repr(np.array([max_uint64])))
# array([18446744073709551615], dtype=uint64)
print(repr(np.array([max_uint64 + 1])))
# array([18446744073709551616], dtype=object)
So, up until 19 and above 21 everything works well.
When you use 20, it does convert to uint64
.
However, when you use np.dot()
it realizes it can no longer use int64
nor uint64
to hold the result and casts everything to np.float64
:
print(np.dot([1], [max_int64]))
# 9223372036854775807
print(np.dot([1], [max_uint64]))
# 1.8446744073709552e+19
print(np.dot([1], [max_uint64 + 1]))
# 18446744073709551616
Instead, when you start with something that is already a uint64 it keeps using that:
print(np.dot(np.array([1], dtype=np.uint64), [max_uint64]))
# 18446744073709551616
print(np.dot(np.array([4321], dtype=np.uint64), [max_uint64]))
# 18446744073709547295 # wrong result
which has its own overflow problems.
Mitigation
One way to make sure the above code works all the time is to force the dtype
of uniq_tens
to object
:
import numpy as np
test_array = np.array([[0, 1] + [0] * 17 + [1]])
uniq_tens = np.array([(10 ** i) for i in range(test_array.shape[1])], dtype=object)
print(test_array.shape[1], int(np.dot(test_array, uniq_tens)[0]))
# 20 10000000000000000010
Other approaches
If we are after the fastest way of computing the integer with a specific base, a number of approaches can be devised:
import numpy as np
import numba as nb
def bools_to_int(arr, base=2):
return sum(base ** i for i, x in enumerate(arr.tolist()) if x)
def bools_to_int_dot(arr, base=2):
pows = np.array([base ** i for i in range(len(arr))], dtype=object)
return np.dot(arr, pows)
def bools_to_int_mul_sum(arr, base=2):
pows = np.array([base ** i for i in range(len(arr))], dtype=object)
return np.sum(arr * pows)
@nb.njit
def bools_to_int_nb(arr, base=2):
n = arr.size
result = 0
for i in range(n):
if arr[i]:
result += base ** i
return result
The looped approach can be accelerated also with Cython:
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
# cimport numpy as cnp
# cimport cython as ccy
# import numpy as np
# import cython as cy
cpdef bools_to_int_cy(arr, base=2):
cdef long n = arr.size
result = 0
for i in range(n):
if arr[i]:
result += base ** i
return result
Note that the bools_to_int_nb()
approach will work until the int64 limit.
Since the power operation is one of the most expensive in such computation, it can be precomputed to further speed-up multiple calls:
MAX_PRE_VAL = 256
BASES = list(range(2, 16))
POWS = {
b: np.array([b ** i for i in range(MAX_PRE_VAL)])
for b in BASES}
def bools_to_int_pre(arr, base=2, pows=POWS):
return sum(pows[base][i] for i, x in enumerate(arr.tolist()) if x)
def bools_to_int_dot_pre(arr, base=2, pows=POWS):
return np.dot(arr, pows[base][:len(arr)])
def bools_to_int_mul_sum_pre(arr, base=2, pows=POWS):
return np.sum(arr * pows[base][:len(arr)])
It is easy to see that all methods produce the same result (save for the bools_to_int_nb()
with the limitations already noted):
funcs = (
bools_to_int, bools_to_int_pre,
bools_to_int_dot, bools_to_int_dot_pre,
bools_to_int_mul_sum, bools_to_int_mul_sum_pre,
bools_to_int_cy, bools_to_int_nb)
rng = np.random.default_rng(42)
arr = rng.integers(0, 2, 112)
for func in funcs:
print(f"{func.__name__!s:>32} {func(arr)}")
bools_to_int 3556263535586786347937292461931686
bools_to_int_pre 3556263535586786347937292461931686
bools_to_int_dot 3556263535586786347937292461931686
bools_to_int_dot_pre 3556263535586786347937292461931686
bools_to_int_mul_sum 3556263535586786347937292461931686
bools_to_int_mul_sum_pre 3556263535586786347937292461931686
bools_to_int_cy 3556263535586786347937292461931686
bools_to_int_nb -4825705174627124058
With the following code to produce the timings:
rng = np.random.default_rng(42)
timings = {}
k = 16
for n in range(1, 128, 3):
arrs = rng.integers(0, 2, (k, n))
print(f"n = {n}")
timings[n] = []
base = [funcs[0](arr) for arr in arrs]
for func in funcs:
res = [func(arr) for arr in arrs]
is_good = base == res
timed = %timeit -r 8 -n 16 -q -o [func(arr) for arr in arrs]
timing = timed.best * 1e6 / k
timings[n].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to be plotted with:
import pandas as pd
df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', figsize=(10, 8))
Showing that before hitting the int64
limit, the bools_to_int_nb()
approach is the fastest by far and large.
Conversely, for larger values np.dot()
with pre-computing is the fastest.
Without pre-computing, resorting to simple manual looping is the fastest, and the Cython acceleration provides a small but appreciable speed-up.
Beware that power-of-2 problem can probably be optimized more.
Answered By – norok2
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0