Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - PPSSR (Privacy Preserving Secure Speech Recognition)

Teaser

In many activities involving big data, cloud computing offers a common distributed infrastructure for the storage of large amounts of data in a scalable, efficient, and low cost way. For sensitive data, there is the possibility to use encryption for the secure storage of data...

Summary

In many activities involving big data, cloud computing offers a common distributed infrastructure for the storage of large amounts of data in a scalable, efficient, and low cost way. For sensitive data, there is the possibility to use encryption for the secure storage of data in the cloud. While we have become increasingly good at encrypting data at rest, in order to process the data on the cloud we first need to decrypt it, which in turn excludes the possibility for using the cloud’s resources to process sensitive data, unless it can be done in a secure way. Cloud users want to hide sensitive data, from cloud providers; similarly, companies using cloud services want to protect their intellectual property from cloud providers and users. Hence the need for strategies for processing data securely in the cloud becomes increasingly more important.

The objective of this project was to realise an end-to-end encrypted speech recognition system. Such a system would provide confidence to industries currently unwilling to use cloud services (e.g. government, education, medical, financial) to have trust that their data is always encrypted. If data is always encrypted on the cloud then even if the cloud service is breached, the attacker gets encrypted data (that has been backed up) that they still cannot access. Similarly cloud service providers have no knowledge of the personal information relating to the data stored on the cloud.

Work performed

The innovation of this project is that the cloud users can search and access their data at will, without decrypting on the cloud. This facility is provided by a new branch of cryptography called searchable encryption. Searchable encryption has its roots in the science of search and retrieval of information and is an exciting new branch of crypto by view of the fact that it is fast and scalable. Much of the R&D challenges in this project have been aimed at taking well-known design principles from databases and making encrypted analogues that do not compromise security and privacy. Cryptography is an arms race so in parallel to this we have been adding features that were not originally envisaged when designing the system but which are necessary to keep ahead of the latest state-of-the-art security flaws being discovered in the research community. One development in particular was the forward privacy security modifications described in our deliverables.

Aside from the modifications to the original workplan involving security there was a considerable re-evaluation of the target market for the product, which in turn caused the company to re-implement much of its Automatic Speech Recognition technology.

Final results

To our knowledge, most of the known state of the art forward-private searchable encryption schemes provide only single-keyword search which limit their usability. Indeed, multi-keyword searches can be performed by combining several single-keyword searches but this comes at the cost of efficiency. Our scheme provides an efficient solution that is better than the naive brute force approach. Furthermore, the use of a bitmap-based search index gives us good locality (i.e. efficiency metrics) as it contains many consecutive file identifiers belonging to the same keyword.

In a recent study by IBM in which they interviewed 2,200 IT, data protection, and compliance professionals from 477 companies that have experienced a data breach over the past 12 months. According to the findings, data breaches continue to be costlier and result in more consumer records being lost or stolen, year after year. The average financial cost of a data breach is in excess of 3.8 million dollars. Hence in this current threat climate it is important that we take whatever steps we can to ensure that data is always encrypted.

Our schemes achieve good performance that makes it usable in practice and should thus gives the industry the confidence needed to deploy searchable encryption schemes as a mechanism to encrypt users data in the cloud and make it searchable. This will indeed reduce the data breaches that we hear in the news everyday.

Website & more info

More info: http://www.intelligentvoice.com.