相同的代码,在Windows/Ubuntu的非常不同的精度(Keras/Tensorflow)

2019年4月18日 35点热度 0条评论 来源: Eldar M.
import pandas as pd 
import numpy as np 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.layers import Dropout 
from keras.layers import LSTM 
from keras.optimizers import Adam 
from sklearn.preprocessing import MinMaxScaler 

def create_dataset(dataset, datasetClass, look_back): 
    dataX, dataY = [], [] 
    for i in range(len(dataset)-look_back-1): 
     a = dataset[i:(i+look_back), 0] 
     dataX.append(a) 
     dataY.append(datasetClass[:,(i+look_back):(i+look_back+1)]) 

    return np.array(dataX), np.array(dataY) 

def one_hot_encode(dataset): 
    data = np.zeros((11, len(dataset)),dtype='int') 
    for i in range(len(dataset)): 
     data[dataset[i]-1,i] = 1 
    return data 

#Set a seed for repeatable results 
np.random.seed(12) 


dataframe = pd.read_csv('time-series.csv', usecols=[1], engine='python') 
dataset = dataframe.values 
dataset = dataset.astype('float32') 

dataframeClass = pd.read_csv('time-series-as-class.csv', usecols=[1], engine='python') 
datasetClass = dataframeClass.values 
datasetClass = datasetClass.astype('int') 

datasetClass = one_hot_encode(datasetClass) 

#normalize input vals 
scaler = MinMaxScaler(feature_range=(0, 1)) 
dataset = scaler.fit_transform(dataset) 


#separate to test/train 
train_size = int(len(dataset) * 0.67) 
test_size = len(dataset) - train_size 
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :] 
trainClass, testClass = datasetClass[:, 0:train_size,], datasetClass[:, train_size:len(dataset)] 

#set up sliding windows 
look_back = 150 
trainX, trainY = create_dataset(train, trainClass, look_back) 
testX, testY = create_dataset(test, testClass, look_back) 


#reformat for proper passing to nn 
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1])) 
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1])) 
trainY = np.squeeze(trainY, 2) 
testY = np.squeeze(testY, 2) 

# create and fit the LSTM network 
model = Sequential() 
model.add(LSTM(15, input_shape=(1,look_back))) 
model.add(Dense(22,activation='tanh')) 
model.add(Dropout(0.2)) 
model.add(Dense(11,activation='softmax')) 
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['categorical_accuracy']) 
print(model.summary()) 
model.fit(trainX, trainY, epochs=90, batch_size=1, verbose=2) 
# make predictions 
trainPredict = model.predict(trainX) 
testPredict = model.predict(testX) 

我已经在Ubuntu和Windows上运行此。测试窗口keras v 2.0.4和2.0.8,在ubuntu 2.0.5(最新版本可通过conda)相同的代码,在Windows/Ubuntu的非常不同的精度(Keras/Tensorflow)

窗口上的准确性是17%,分类crossentropy是在〜2,慢慢地收敛,但它始终从那里开始

在Ubuntu的准确性是98%,分类crossentropy似乎是0,它实际上并没有改变

唯一代码的区别是对的CSV文件路径下,CSV文件完全一样。什么可能导致如此巨大的差异?

有差异是一个或两个百分点,/我能写它作为辍学TF随机初始化,但就是这太过分了是纯粹偶然

编辑:该解决方案被证明是固定的范畴CSV文件,虽然它们是utf-8,但显然当在windows中创建它们时,需要其他东西才能让它们与Linux一起玩。我不知道如果我被允许以纪念我自己的答案是“公认的”需求


===========解决方案如下:

面前的问题被证明是在CSV文件,这原本移植到被设置从窗户。尽管它们保存为utf-8格式,但我仍然需要去libreoffice并将它们保存为linux csv文件。

在初始状态下,它们没有加载失败,但没有正确地进行一次热编码,导致所有单编码为0.显然这导致了非常高的准确性。

    原文作者:Eldar M.
    原文地址: https://stackoverflow.com/q/46672864
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。