iT邦幫忙

0

怎麼在PYSPARK下存模型

目前使用的python版本為 2.7、java版本為1.8、scala版本為2.11.12

我現在想要將模型存下來,但是輸入指令後,卻出現以下錯誤:
這是我輸入的指令:
rf = RandomForestRegressor(featuresCol="features",
labelCol=df2.columns[2],
numTrees=200,featureSubsetStrategy="auto",
minInstancesPerNode=1)
model=rf.fit(XX)
path ='C:/Users/user/Desktop/123456'
model.save(path)

出現的錯誤訊息如下:
An error occurred while calling o163.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:96)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)

請問有人知道該怎麼做嗎 Q___Q

froce iT邦高手 2 級 ‧ 2018-05-12 20:36:11 檢舉
debug有完整嗎?
ignored
Traceback (most recent call last):

File "<ipython-input-21-14af60d28c4b>", line 1, in <module>
runfile('C:/Users/user/Desktop/0512/05012.py', wdir='C:/Users/user/Desktop/0512')

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 86, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/user/Desktop/0512/05012.py", line 59, in <module>
model.save(path)

File "C:\spark-2.3.0\python\lib\pyspark\ml\util.py", line 204, in save
self.write().save(path)

File "C:\spark-2.3.0\python\lib\pyspark\ml\util.py", line 165, in save
self._jwrite.save(path)
File "C:\spark-2.3.0\python\lib\py4j\java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)

File "C:\spark-2.3.0\python\lib\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)

File "C:\spark-2.3.0\python\lib\py4j\protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o163.save.
: java.io.IOException: Path C:/Users/user/Desktop/123456 already exists. To overwrite it, please use write.overwrite().save(path) for Scala and use write().overwrite().save(path) for Java and Python.
at org.apache.spark.ml.util.FileSystemOverwrite.handleOverwrite(ReadWrite.scala:503)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

這是完整的bug 看起來 像路徑錯,請問有沒有正確的設法呢?我進官網看,寫得有點不清楚......

1 個回答

0
darwin0616
iT邦新手 4 級 ‧ 2018-05-14 09:27:04

從拋錯的訊息來看,應該是沒開Hadoop或者路徑的格式錯誤!
org.apache.spark.internal.io.SparkHadoopWriter
有設HADOOP_CONF_DIR環境變數下的預設路徑開頭會是 hdfs://...
沒設的話預設路徑可試試 file://...
不過看你的儲存路徑是 C:/Users/user/Desktop/123456 所以你的Spark跟Hadoop是部屬在 Win OS下???
因為我是部屬在CentOS 7下, 所以不確定是不是 hdfs://C:/Users/user/Desktop/123456file://C:/Users/user/Desktop/123456
(沒處理過還有標示C槽的路徑, Linux下都是 /.../...)
順便檢查看看 start-dfs.cmd(不確定Win OS是不是這個, Linux是start-dfs.sh)
是否啟動!

Windows 部屬可參考Apache Spark in 24 hours(感覺微軟就是很囉嗦 XD)

謝謝您提供的網站!我可能還要再研究一下!我後來把檔名改成 file://~~~ 仍然沒辦法, google 一下看到用SPARK不一定要開 Hadoop!後來想到會不會是版本的問題,因為我在 lab 用的是單機版.....

我要發表回答

立即登入回答