close icon
daily.dev platform

Discover more from daily.dev

Personalized news feed, dev communities and search, much better than what’s out there. Maybe ;)

Start reading - Free forever
Start reading - Free forever
Continue reading >

Revolutionizing Data Security by Design

Revolutionizing Data Security by Design
Author
Pratik Shivaraikar
Related tags on daily.dev
toc
Table of contents
arrow-down

🎯

For decades, we have benefited from modern cryptography to protect our sensitive data during transmission and storage. However, we have never been able to keep the data protected while it is being processed.

Nearly 4 billion data records were stolen in 2016. Each one cost the record holder almost $158. If we do the simple math, in 2016 alone, attackers amassed a whopping $632 billion. The very scale, sophistication, and cost of cyber-attacks escalate every year. Cyber-attacks will continue this exploitation, and today’s technologies will not be able to keep pace. In such times, we need encryption technology to disorient and discourage bad actors.

For example, many years from now, a fault-tolerant, a universal quantum computer with millions of qubits could quickly sift through the probabilities and decrypt even the strongest common encryption, rendering this foundational security methodology, that we know as of today, obsolete.

This is where Homomorphic Encryption comes in. Homomorphic encryption helps us in solving a lot of problems that today's elliptic-curve cryptography (ECC) algorithms fail to address in our cloud infrastructure security.

Shortcomings of today's encryption techniques

When it comes to cloud security, our data is encrypted in two states: during transit and on storage.

In transit, the encryption techniques that we use today suffer from a problem called TLS / SSL termination. Interestingly, this problem that we're talking about is also very proudly marketed as a feature by reverse proxies such as Nginx, Envoy, etc.

TLS termination is used by reverse proxies for handling incoming connections and decrypting the TLS to pass on the unencrypted request to the appropriate servers. This is precisely the infrastructure limitation that attackers take advantage of. The whole threat model revolves around exploiting the fact of the availability of unencrypted data past this TLS termination phase.

tls-termination.jpg

In the case of storage, there are two ways in which we do things today. We either store the data in our databases mostly unencrypted in plain text; or, in some cases, by doing some form of encryption. In the case of cloud providers like GCP, AWS, Azure, etc., this encryption is done using some Key Management Service (KMS). Even in this case, while the data may be stored encrypted, there always comes a time where the application needs to decrypt the data if it wants to perform any operation on it.

Every service that we know, as of today, runs on unencrypted data. The trends that Twitter shows cannot be obtained by operating on encrypted data. The recommendations system on YouTube, the news feed on Facebook, all the predictions of every application that we see out there operate on unencrypted data.

It is these very shortfalls that Homomorphic encryption aims to address.

Homomorphic encryption

Imagine if you could compute on encrypted data without ever decrypting it.
What would you do?

― Flavio Bergamaschi

Lattice-based cryptography proves it's superiority as it uses complicated math problems to hide data. By the time computers are strong enough to crack today's encryption, the world can be prepared with lattice cryptography. Lattice cryptography, as of this day, to the best of our knowledge, is quantum resistant. It means that there does not exist any quantum algorithm that can decrypt this type of cryptography. Lattice cryptography is also the basis of Homomorphic Encryption (FHE).

Homomorphic encryption is the ability to perform arithmetic operations on encrypted data. None of our existing encryption techniques allow us to do that. Because of this ability, we don't need to decrypt our data, ever! It does, quite conveniently, address the shortcomings of our existing encryption techniques. In transit, the TLS termination problem never occurs as the reverse proxy need not decrypt the data. It can perform all its operations on the encrypted data itself and make all the necessary decisions without ever terminating the TLS. Even in a persistent store, all database queries can very well be performed on encrypted data.

fhe.jpg

Fully Homomorphic Encryption (FHE) protects us from these honest-but-curious threat models. An honest-but-curious (HBC) adversary is a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages. To get an idea of what this means, a useful comparison can help us great bounds.

todays-threat-model.jpg

With the way that we do things today, the common consensus is that Alice encrypts some data and sends it as an input to Bob. Bob can decrypt that data, process, and store it at his end. Just like Alice, even Bob can encrypt some data and send it over to Alice, where she can decrypt and process it at her end. Such a mechanism protects us against man-in-the-middle (MITM) attacks. Which is why Eve can't eavesdrop on any communication between Alice and Bob. But Bob, on the other hand, has access to all this unencrypted data. Here, Bob is the honest-but-curious actor.

For the sake of convenience, we are assuming Bob to be an honest-but-curious actor in this case without any malicious intent. For the threat models involving Bob, sitting inside our cloud infrastructure, having malicious intentions, and free access to all this unencrypted data, there are other protocols that we can use in combination with homomorphic encryption to counter such scenarios. But at this moment, for the sake of convenience, we will just be assuming Bob to be an honest-but-curious actor with non-malicious intent.

Interestingly, in the case of Homomorphic encryption, along with protection against eavesdropping and MITM, we get the added protection of not allowing Bob to sit on a gold mine of unencrypted data by encrypting everything that gets stored. This, however, does not steal away Bob's ability to perform operations on the data as he used to. One of the real benefits of homomorphic encryption is that unlike all the encryption techniques that we've seen till now, we need not decrypt the data. We can perform all the operations on the encrypted data itself.

fhe-threat-model.jpg

Applications of Homomorphic encryption

Right off the bat, some of the use-cases that we can consider for such an encryption technique are:

  • Oblivious queries. Allowing searching without intent. For example, today, while requesting weather info, we need to reveal our location to cloud providers. In case of homomorphic encryption, since our location too, will always be encrypted, we need not reveal a lot of our data
  • Set intersections. Today, to determine an overlap, we need to share both the sets completely. Using homomorphic encryption, we can determine the overlaps without disclosure of the entire sets.
  • Extracting value from private data. We can now use all the machine learning models like traditional, regression or neural network models, etc. to perform the computation of all of our private data
  • Secure outsourcing. Even today, there still exist quite a few enterprises that maintain on-prem infrastructure due to a lack of trust with the cloud providers. Homomorphic encryption, because of its data privacy features by design, can encourage wider cloud adoption.

Proof of Concept

Without making this article sound like an ad, let us get our hands dirty and watch how Homomorphic Encryption can be implemented. Microsoft has a SEAL library, which supports homomorphic encryption. IBM, too recently released a Fully Homomorphic Encryption toolkit for Linux. For the sake of simplicity, since IBM's FHE toolkit is based on Docker container, we will be using it for our POC.

First, we need to clone the repo:


$ git clone https://github.com/IBM/fhe-toolkit-linux.git

Once cloned, we need to run the `FetchDockerImage.sh` shell script. We also need to provide container OS as an argument to the shell script. For simplicity, we will be using Ubuntu:


$ cd fhe-toolkit-linux
$ ./FetchDockerImage.sh ubuntu


The download and setup of the toolkit will take some time, depending on the bandwidth speed and hardware.

Next, we need to run the IBMCOM pre-built toolkit from Docker Hub:


$ ./RunToolkit.sh -p ubuntu

The output of the above command should be something similar to:


$ ./RunToolkit.sh -p ubuntu
WARNING: No swap limit support
INFO:    Using system default persistent storage path...
INFO:    Persistent data storage: "/home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace"
INFO:    CMake: Deleting cached built settings and reconfigure
INFO:    Launching FHE tookit:


docker run -d --name fhe-toolkit-ubuntu  -v /home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace:/opt/IBM/FHE-Workspace  -p 8443:8443 ibmcom/fhe-toolkit-ubuntu


8fdcd97b1d203f0e71e4602ce6d24a76cd768c5fc2f8c5ee6b99ed7acb1a7886

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS                  PORTS                    NAMES
8fdcd97b1d20        ibmcom/fhe-toolkit-ubuntu   "code-server --bind-…"   6 seconds ago       Up Less than a second   0.0.0.0:8443->8443/tcp   fhe-toolkit-ubuntu

FHE Development is open for business: https://127.0.0.1:8443/

We now have a web server running at https://127.0.0.1:8443/. All our next operations will be through the browser.

On opening the browser and accepting the prompt because of the self-signed certificate, it will open the VS code interface in the browser. Soon, it will ask us to select a kit, make sure to choose the option which says GCC for x86_64-linux-gnu 9.3.0

Select kit.png
configure project.png

Next, click Build in the CMake Tools status bar to build the selected target.

build.png

If you look into the `examples/BGV_country_db_lookup` directory, you can find the `countries_dataset.csv` file. It is a list of countries and their capital cities from the continent of Europe. When we are running the toolkit, it will be using the `BGV_country_db_lookup.cpp` file to encrypt the contents of CSV. It also contains code that allows us to search on encrypted data. On providing the country name as input, the script will look up through the encrypted list of countries and output it's matching capital.

Let's proceed to run the toolkit:

run.png

Following the text instructions, if we go ahead and enter any country, it goes through the databases and outputs the capital of the same.

search.png

Final thoughts

Though Homomorphic Encryption is a great and extremely promising technology, is it ready for out-of-the-box use? Absolutely not. This is very much evident from the POC that we did. For searching an encrypted database with around 47 entries, it took almost 2-3 minutes. There is no denying that this is an impressive start and definitely in the right direction, but we still have a long way to go. Having said that, Homomorphic Encryption can very well be the next big breakthrough in the computer science industry. We can only imagine the endless possibilities when the first FHE-enabled database would be implemented. Or the first FHE-supported proxy. Nonetheless, we're surely in for some exciting times ahead!

Why not level up your reading with

Stay up-to-date with the latest developer news every time you open a new tab.

Read more