Following Joshua Garland, Ryan James, and Elizabeth Bradley information theoretical redundancy is explained and estimated using weighted permutation entropy on a subset of the Two Sigma financial time-series data from Kaggle.
Why is this important/exciting?
Finansial time-series are used widely for making predictions about, well, finansial time-series (in the future). These predictions end up in reports that private and government decision-makers use to change the life of thousands of people on a daily basis.
Often, those time-series (like an index on the stock market) are very high dimensional and show characteristics of chaos, such that, on average, prediction is futile. This is more or less the premise of the efficient-market hypothesis.
Based on this, a natural question emerges: Is there a measure of dimensionality, complexity or chaos that can quantify the presence (or absence) of structure in data that allows prediction? .Basically, this what permutation entropy promises:
Redundancy is a empirically tractable measure of the complexity that arises in real-world time-series data “which results from the dimension, nonlinearity, and nonstationarity of the generating process, as well as from measurement issues such as noise, aggregation, and finite data length.”
def permutation_entropy(time_series, m, delay): Args: time_series: Time series for analysis m: Order of permutation entropy delay: Time delay Returns: Vector containing Permutation Entropy n = len(time_series) permutations = np.array(list(itertools.permutations(range(m)))) c =  * len(permutations) for i in range(n - delay * (m - 1)): # sorted_time_series = np.sort(time_series[i:i+delay*m:delay], kind='quicksort') sorted_index_array = np.array(np.argsort(time_series[i:i + delay * m:delay], kind='quicksort')) for j in range(len(permutations)): if abs(permutations[j] - sorted_index_array).any() == 0: c[j] += 1 c = [element for element in c if element != 0] p = np.divide(np.array(c), float(sum(c))) pe = -sum(p * np.log(p)) return pe