APIs/Parameters
We provide two kinds of APIs: command-line interface (CLI) and Python interface. For CLI, users only need to prepare a configuration file specifying the parameters and call FedTree in a one-line command. For Python interface, users can define two classes FLClassifier and FLRegressor with the parameters and use them in a scikit-learn style (see here). The parameters are below.
Contents
Parameters for Federated Setting
mode
[default =horizontal
, type=string]horizontal
: horizontal federated learningvertical
: vertical federated learning
num_parties
[default =10
, type = int, alias:num_clients
,num_devices
]Number of parties
partition
[default =0
, type = bool]0
: each party has a prepared local dataset1
: there is a global dataset and users require FedTree to partition it to multiple subsets to simulate federated setting.
partition_mode
[default=``horizontal``, type=string]horizontal
: horizontal data partitioningvertical
: vertical data partitioning
ip_address
[default=``localhost``, type=string, alias:server_ip_address
]The ip address of the server in distributed FedTree.
data_format
[default=``libsvm``, type=string]
n_features
[default=-1, type=int]Number of features of the datasets. It needs to be specified when conducting horizontal FedTree with sparse datasets.
propose_split
[default=``server``, type=string]server
: the server proposes candidate split points according to the range of each feature in horizontal FedTree.party
: the parties propose possible split points. Then, the server merge them and sample at most num_max_bin candidate split points in horizontal FedTree.
key_length
[default=512, type=int]Number of bits of the key used in encryption.
pred_output
[default=``predictions.txt``, type=string]The file to save the predicted labels when using FedTree-predict
Parameters for GBDTs
data
[default=``../dataset/test_dataset.txt``, type=string, alias:path
]The path to the training dataset(s). In simulation, if multiple datasets need to be loaded where each dataset represents a party, specify the paths seperated with comma.
model_path
[default=``fedtree.model``, type=string]The path to save/load the model.
verbose
[default=1, type=int]Printing information: 0 for silence, 1 for key information and 2 for more information.
depth
[default=6, type=int]The maximum depth of the decision trees. Shallow trees tend to have better generality, and deep trees are more likely to overfit the training data.
n_trees
[default=40, type=int]The number of training iterations.
n_trees
equals to the number of trees in GBDTs.
max_num_bin
[default=32, type=int]The maximum number of bins in a histogram. The value needs to be smaller than 256.
learning_rate
[default=1, type=float, alias:eta
]Valid domain: [0,1]. This option is to set the weight of newly trained tree. Use
eta < 1
to mitigate overfitting.
objective
[default=``reg:linear``, type=string]Valid options include
reg:linear
,reg:logistic
,binary:logistic
,multi:softprob
,multi:softmax
,rank:pairwise
andrank:ndcg
.reg:linear
is for regression,reg:logistic
andbinary:logistic
are for binary classification.multi:softprob
andmulti:softmax
are for multi-class classification.multi:softprob
outputs probability for each class, andmulti:softmax
outputs the label only.rank:pairwise
andrank:ndcg
are for ranking problems.
num_class
[default=1, type=int]Set the number of classes in the multi-class classification.
min_child_weight
[default=1, type=float]The minimum sum of instance weight (measured by the second order derivative) needed in a child node.
lambda
[default=1, type=float, alias:lambda_tgbm
orreg_lambda
]L2 regularization term on weights.
gamma
[default=1, type=float, alias:min_split_loss
]The minimum loss reduction required to make a further split on a leaf node of the tree.
gamma
is used in the pruning stage.
Parameters for Privacy Protection
privacy_method
[default =none
, type=string]none
: no additional method is used to protect the communicated messages (raw data is not transferred).he
: use homomorphic encryption to protect the communicated messages (for vertical FedTree).sa
: use secure aggregation to protect the communicated messages (for horizontal FedTree).dp
: use differential privacy to protect the communicated messages (currently only works for vertical FL with single machine simulation).
privacy_budget
[default=10, type=float]Total privacy budget if using differential privacy.