Introduction to PySyft

Contents

Introduction and Installation
First Look
Sandbox
Federated (deep) learning

last update: 2019-07-20

Introduction and Installation

PySyft is a PyTorch extension for secure and private deep learning that gained popularity recently. PySyft’s secure multiparty computation (SMPC) approach is explained in Ryffel et al. (2019) in great detail. Let’s have a look at PySyft.

!!This is still experimental and not for production use!!

Do NOT use this code to protect data (private or otherwise) - at present it is very insecure. Come back in a couple months.

– https://github.com/OpenMined/PySyft/blob/master/README.md#disclaimer

If you are interested in a quick overview of available functions, then have a look at my PySyft cheat sheet.

PySyft enables us to use

federated learning (FL),
multi-party computation (MPC),
and differential privacy (DP)

with PyTorch and is available under an Apache License 2.0. It basically implements a protocol for communication (command and control) and data transfer with virtual workers. Nevertheless, some parts are connected to TensorFlow (not covered in this blog post).

Installation is fairly easy since it is available via pip:

$ pip install syft

However, I recommend installing it using conda and installing PyTorch first to have access to CUDA without additional work.

$ conda create -n pysyft python=3
$ conda activate pysyft
(pysyft) $ conda install -c pytorch pytorch
(pysyft) $ conda install nb_conda jupyter notebook # etc. if you need it
(pysyft) $ pip install syft

PySyft implements a communication protocol between a master node and network (and virtual) workers. Further, it provides us with a tensor chain to subdivide a master tensor, which remains a PyTorch tensor. The tensor chain requires two kinds of subclasses of the SyftTensor class:

LocalTensor
PointerTensor.

First Look

I worked through this using multiple sessions. Therefore, IDs (numbers) will vary.

Let’s work through some of the example tutorials. We’ll run everything locally and therefore stick with virtual workers.

Let’s load PySyft:

import syft as sy
import torch

hook = sy.TorchHook(torch)

With sy.TorchHook, we override methods on PyTorch tensors, meaning that extended functions on PyTorch tensors are made available.

Let’s create virtual workers.

worker1 = sy.VirtualWorker(hook, id="worker1")
worker2 = sy.VirtualWorker(hook, id="worker2")

sy.VirtualWorker requires a hook, everything else is optional. However, it is useful to use a proper and unique id. Further, these workers are added to a list of known workers automatically. If we don’t want this, then we have to set audo_add=False. The virtual workers are added but they don’t have any objects attached to them:

print(worker1._objects)
print(worker2._objects)

{}
{}

We can create several tensors now and send them to our workers:

a = torch.tensor([1,2,3])
b = torch.tensor([4,5,6])
c = torch.tensor([7,8.9])

# tensor.send is available due to our hooking into PyTorch
# (hook;sy.TorchHook)

a_at_worker1 = a.send(worker1)
b_at_worker2 = b.send(worker2)
c_at_worker2 = c.send(worker2)


print(a_at_worker1)
print(b_at_worker2)
print(c_at_worker2)

(Wrapper)>[PointerTensor | me:84093131065 -> worker1:54110135230]
(Wrapper)>[PointerTensor | me:64005833858 -> worker2:55192932033]
(Wrapper)>[PointerTensor | me:1797084545 -> worker2:87033834925]

Again, let’s have a loot at the worker’s objects:

print(worker1._objects)
print(worker2._objects)

{53337501310: tensor([1, 2, 3])}
{25139704315: tensor([4, 5, 6]), 95653586915: tensor([7.0000, 8.9000])}

d_at_worker_2 = b_at_worker2 + b_at_worker2
print(d_at_worker_2)
print(worker2._objects)

(Wrapper)>[PointerTensor | me:74409772389 -> worker2:9843770559]

Let’s see what happens if we add two tensors on different workers:

e_at_ = a_at_worker1 + b_at_worker2
print(e_at_)

Results from my first attempt

(False results - this shouldn’t have happened, see below)

(Wrapper)>[PointerTensor | me:20917314228 -> worker2:15675838766]

Apparently, the tensor will be created on the worker named last in this chain. But wait.

print(worker1._objects)
print(worker2._objects)

{53337501310: tensor([1, 2, 3])}
{25139704315: tensor([4, 5, 6]), 95653586915: tensor([7.0000, 8.9000]), 9843770559: tensor([ 8, 10, 12]), 15675838766: tensor([ 8, 10, 12])}

Apparently, it is not that simple. It looks like the last valid operation (d_at_worker_2 = b_at_worker2 + b_at_worker2) was repeated. We would have ended with 15675838766: tensor([ 5, 7, 9]) otherwise.

I repeated this after restarting the kernel

TensorsNotCollocatedException: You tried to call __add__ involving two tensors which are not on the same machine! One tensor is on <VirtualWorker id:worker1 #objects:1> while the other is on <VirtualWorker id:worker2 #objects:3>. Use a combination of .move(), .get(), and/or .send() to co-locate them to the same machine.

That makes a lot more sense. (Unfortunately, I couldn’t really trace back the error mentioned above.)

If we don’t know anything about the workers, but know the variable names of our distributed tensor objects, then we can access their location using .location. This yields:

print(d_at_worker_2.location)

<VirtualWorker id:worker2 #objects:3>

Well, apparently, it tells us a bit more than requested (IMHO). We get an additional info about how many objects are stored at worker2. IMHO this is a security nightmare (but PySyft has an unsecure warning upfront since it is still in heavy development). On the other hand we are accessing this information from an owners perspective (d_at_worker_2.owner yields <VirtualWorker id:me #objects:0>), hence implications are rather small.

To check, they information above is equal to printing out a virtual worker:

print(worker1)
print(worker2)

<VirtualWorker id:worker1 #objects:1>
<VirtualWorker id:worker2 #objects:3>

The distributed computation problem mentioned above leads us to the topic of moving data (tensors). We can get a tensor from a worker (SyftTensor on a worker) using .get().

d_at_worker_2.get()

tensor([ 8, 10, 12])

Wait. We didn’t get (received) the tensor, we (re)moved it from the worker as well:

worker2._objects

{25002531774: tensor([4, 5, 6]), 99209007781: tensor([7.0000, 8.9000])}

But how do we add two tensors that are placed on different workers? First, we can make the data on an other worker available using pointers.

b_at_worker1 = b_at_worker2.send(worker1)
print(b_at_worker1)
print(worker1._objects)
print(worker2._objects)

(Wrapper)>[PointerTensor | me:82187100086 -> worker1:94191098779]

{8605408485: tensor([1, 2, 3]),
 94191098779: (Wrapper)>[PointerTensor | worker1:94191098779 -> worker2:25002531774]}

{25002531774: tensor([4, 5, 6]), 99209007781: tensor([7.0000, 8.9000])}

worker1 contains a pointer now, that points to the tensor on/at worker2, while nothing changes no worker2. .get() works exactly the same on pointers as above on tensors.

Of course, we can move data from one worker to another.

b_at_worker1 = b_at_worker2.move(worker1)

print(worker1._objects)
print(worker2._objects)

{8605408485: tensor([1, 2, 3]), 94191098779: tensor([4, 5, 6])}

{99209007781: tensor([7.0000, 8.9000])}

Now, we can do some arithmetic on worker1. This was not possible on worker2, because tensors were of different size. After that, we can move around the result again, if we have to.

Sandbox

There exists a sandbox or better a “toolbox” that allows to automate setting up workers and datasets for experimental purposes.

import torch
import syft as sy
sy.create_sandbox(globals())

Setting up Sandbox...
	- Hooking PyTorch
	- Creating Virtual Workers:
		- bob
		- theo
		- jason
		- alice
		- andy
		- jon
	Storing hook and workers as global variables...
	Loading datasets from SciKit Learn...
		- Boston Housing Dataset
		- Diabetes Dataset
		- Breast Cancer Dataset
	- Digits Dataset
		- Iris Dataset
		- Wine Dataset
		- Linnerud Dataset
	Distributing Datasets Amongst Workers...
	Collecting workers into a VirtualGrid...

(for some reason I get quite a few TensorFlow deprecation messages here)

Federated (deep) learning

Let’s do something useful and do some deep learning.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

import syft as sy
hook = sy.TorchHook(torch)
worker1 = sy.VirtualWorker(hook, id="worker1")
worker2 = sy.VirtualWorker(hook, id="worker2")
worker3 = sy.VirtualWorker(hook, id="worker3")
worker4 = sy.VirtualWorker(hook, id="worker4")


class Arguments():
    def __init__(self):
        self.batch_size = 64
        self.test_batch_size = 1000
        self.epochs = 10
        self.lr = 0.01
        self.momentum = 0.5
        self.no_cuda = False
        self.seed = 1
        self.log_interval = 30
        self.save_model = False

args = Arguments()

use_cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)

device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

That’s plain PyTorch. Let’s see how we load a dataset, distributed/decentralize it.

federated_train_loader = sy.FederatedDataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ]))
    .federate((worker1,worker2, worker3, worker4)),
    batch_size=args.batch_size, shuffle=True, **kwargs)

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.test_batch_size, shuffle=True, **kwargs)

sy.FederatedDataLoader is responsible to distribute a dataset over the virtual workers. Basically, we use datasets.MNIST(**args).federate((tuple of workers)).

Since we are using PyTorch, we have to setup a network class including its forward function. Further, we have to setup a train and a test function:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

def train(args, model, device, federated_train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(federated_train_loader): # <-- now it is a distributed dataset
        model.send(data.location) # <-- NEW: send the model to the right location
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        model.get() # <-- NEW: get the model back
        if batch_idx % args.log_interval == 0:
            loss = loss.get()  # <-- NEW: get the loss back
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * args.batch_size, len(federated_train_loader) * args.batch_size,
                100. * batch_idx / len(federated_train_loader), loss.item()))

def test(args, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.argmax(1, keepdim=True) # get the index of the max log-probability 
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

We have to add a few things to the train function. However, since we are hooked into torch (hook = sy.TorchHook(torch)), the tricky stuff is already taken care of.

Basically, we have to distribute the dataset using the for loop (for batch_idx, (data, target) in enumerate(federated_train_loader)) and make sure that we send the data (location) to a model (model.send(data.location)) and receive the model (model.get()) and the loss (loss = loss.get()). That’s it. The rest is similar to a standard PyTorch use case.

When it comes to writing the test function, we have to make sure that we add everything up.

%%time
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr) # TODO momentum is not supported at the moment

for epoch in range(1, args.epochs + 1):
    train(args, model, device, federated_train_loader, optimizer, epoch)
    test(args, model, device, test_loader)


# if (args.save_model):
#     torch.save(model.state_dict(), "mnist_cnn.pt")

We’ll end up with something like this:

Train Epoch: 1 [0/60032 (0%)]	Loss: 2.303694
Train Epoch: 1 [1920/60032 (3%)]	Loss: 2.160355
Train Epoch: 1 [3840/60032 (6%)]	Loss: 1.951651
Train Epoch: 1 [5760/60032 (10%)]	Loss: 1.432061
Train Epoch: 1 [7680/60032 (13%)]	Loss: 0.878812
Train Epoch: 1 [9600/60032 (16%)]	Loss: 0.670369
Train Epoch: 1 [11520/60032 (19%)]	Loss: 0.463988
Train Epoch: 1 [13440/60032 (22%)]	Loss: 0.438397
Train Epoch: 1 [15360/60032 (26%)]	Loss: 0.364026
Train Epoch: 1 [17280/60032 (29%)]	Loss: 0.342540
Train Epoch: 1 [19200/60032 (32%)]	Loss: 0.312856
Train Epoch: 1 [21120/60032 (35%)]	Loss: 0.320529
Train Epoch: 1 [23040/60032 (38%)]	Loss: 0.381732
Train Epoch: 1 [24960/60032 (42%)]	Loss: 0.257132
Train Epoch: 1 [26880/60032 (45%)]	Loss: 0.342614
Train Epoch: 1 [28800/60032 (48%)]	Loss: 0.380054
Train Epoch: 1 [30720/60032 (51%)]	Loss: 0.220602
Train Epoch: 1 [32640/60032 (54%)]	Loss: 0.166660
Train Epoch: 1 [34560/60032 (58%)]	Loss: 0.454778
Train Epoch: 1 [36480/60032 (61%)]	Loss: 0.255775
Train Epoch: 1 [38400/60032 (64%)]	Loss: 0.267324
Train Epoch: 1 [40320/60032 (67%)]	Loss: 0.153567
Train Epoch: 1 [42240/60032 (70%)]	Loss: 0.128651
Train Epoch: 1 [44160/60032 (74%)]	Loss: 0.214276
Train Epoch: 1 [46080/60032 (77%)]	Loss: 0.335780
Train Epoch: 1 [48000/60032 (80%)]	Loss: 0.176695
Train Epoch: 1 [49920/60032 (83%)]	Loss: 0.198499
Train Epoch: 1 [51840/60032 (86%)]	Loss: 0.240956
Train Epoch: 1 [53760/60032 (90%)]	Loss: 0.242950
Train Epoch: 1 [55680/60032 (93%)]	Loss: 0.162023
Train Epoch: 1 [57600/60032 (96%)]	Loss: 0.103892
Train Epoch: 1 [59520/60032 (99%)]	Loss: 0.103885

Training sometimes ends after Epoch 1 because something with pickling doesn’t work. Anyhow, I don’t have to time to look into it and debug it. Perhaps, that’s something for the next weeks.

It might be interested to really test it out on real machines and not with virtual workers.