

The unserialize of course takes the serialized object and build from it a proper explainer. The serialize should return an object made of dictionary, lists, whatever "basic" python types. Unfortunately, to do so, objects needed for the computation must be serializable (through pickle) in order to be dispatched to every node, and an explainer is not.Ī possible solution is to expose a serialize and a unserialize method for an explainer. Theoretically, one could define a UserDefinedFunction, collect the features from spark to get a numpy or a pandas object, run your explainer and let spark do the dispatching. The problem is what mentioned: you can build an explainer based on pyspark, but you cannot feed your explainer with a pyspark DataFrame. The First of all, I want to thank you for this excellent job. Reply to this email directly, view it on GitHub You are receiving this because you commented. > Reply to this email directly, view it on GitHub > You are receiving this because you authored the thread. > might be able to suggest something in a specific case (though I don't > workflow do you have in mind that you would like SHAP to integrate with? On Thu, at 1:46 PM, Scott Lundberg I don't have any plans for that since I am not a Spark user.


If the model-agnostic mode of SHAP would be memory friendly, We would love to provide explanability under the sameĮnvironment. The integrated Xgboost solution works fine.īasically we have a model trained in Spark MLlib and deployed in SparkĮnvironment. But I found it would take a lot of memory and IĬouldn't run it through with a Sklearn gradient boosting tree model, while I think that could be alright because SHAP has a
