Guidelines for Practical Use of Data Using Secure Computation

NTT Communications, the National Institute of Advanced Industrial Science and Technology (AIST), Nomura Research Institute, NRI Secure Technologies, and GMO CyberSecurity by Ierae have published the interim report on the "Guidelines for Data Utilization Using Secure Computation." The aim is to promote the understanding and utilization of secure computation to facilitate data sharing and utilization while addressing privacy and security concerns.
The guidelines are structured into three main parts:
- General Overview: Provides the overall structure of the guidelines and fundamental knowledge about secure computation, its position within Privacy-Enhancing Technologies (PETs), and introduces potential use cases. This blog post covers the general overview.
- Realization Process for Secure Computation Projects: Focuses on project management, processes, and existing examples for sharing and utilizing personal data using secure computation (specifically those employing secret sharing). The intended audience is project managers and those responsible for such initiatives.
- Data Management and Security in Secure Computation: Discusses the legal status of secure computation in Japan and the requirements for system providers to securely manage data within secure computation systems. This section is aimed at providers and those evaluating the security of these systems.
The overview document highlights the increasing importance of data as a source of wisdom, value, and competitiveness in the advancing digital society. It notes the necessity of integrating diverse data sources for informed decision-making and the potential for significant societal and economic benefits in areas like healthcare and industry through data utilization.
However, the report emphasizes the inherent security challenges in data sharing, such as data leaks, unauthorized access, and privacy breaches. Traditional encryption methods, while effective for data in transit and at rest, require decryption for processing, creating a vulnerability during the point of data utilization.
To address this limitation, the guidelines focus on secure computation, a technology that allows for computation on encrypted (cryptographically concealed) data without the need for decryption, thus preserving the confidentiality of the data during processing. The goal of these guidelines is to facilitate the smooth initiation of projects involving personal data processed with secure computation, address legal considerations, and establish robust data management policies for secure computation system providers and users.
The target audience for these guidelines includes individuals interested in secure computation, those promoting data utilization projects using this technology, and businesses developing secure computation systems. The current document is an intermediate report intended for broad feedback to inform the final version, slated for release in fiscal year 2025.
1. Overview of the Guidelines
1.1 Definition of Data Utilization
The guidelines define data utilization as "the act of an organization solving management or social issues by utilizing its own data or the data of other organizations." The underlying premise is that the purpose of data utilization is to solve the challenges faced by the organization, which can include both private companies and public institutions. The definition explicitly acknowledges that data utilization may involve handling data from external entities, especially when dealing with personal data or trade secrets, where barriers to cross-organizational utilization are higher. Secure computation is presented as a key solution to overcome these barriers.
1.2 Definition of Stakeholders in General Data Utilization
The report defines several key stakeholders involved in typical data utilization scenarios:
- Data Registrant: Bears the responsibility for managing the data used for analysis and may personally register data into the system. Their objective is to gain value from data provision (e.g., monetary compensation or public benefit like improved healthcare).
- System Provider: Provides the system used for data management and analysis. Their objective is to earn system usage fees from data registrants, analysts, or result users.
- Data Analyst: Analyzes data to derive insights necessary for achieving the objectives. Their objective is to gain value by providing insights.
- Analysis Result User: Utilizes the insights gained from the analysis to achieve their objectives.
- Data Collaborator: Initiates and promotes data utilization projects, often aligning the interests and necessary contracts between other stakeholders. A data collaborator will always also fit into one of the other defined roles.
1.3 Additional Stakeholder in Data Utilization Using Secure Computation
When secure computation is employed, a new stakeholder emerges:
- Secure Computation Executor: This entity performs the necessary computations on the encrypted data without decrypting it for the data analyst. Typically, the computational infrastructure is provided by the System Provider.
2. Secure Computation Overview
This section provides a fundamental understanding of secure computation for interested individuals.
2.1 What is Secure Computation?
Secure computation is defined as "a general term for technologies that allow for computation (such as data processing, statistical processing, and machine learning) while the data remains encrypted." Unlike traditional data analysis where computation occurs on plain text data, secure computation operates on cryptographically concealed data, and only the computation results are decrypted, leaving the original data protected.
The document illustrates this with a scenario where two organizations want to calculate the average of their data without revealing their individual datasets.
Secure computation is highlighted as a promising technique for protecting personal data and trade secrets during analysis, thereby promoting data sharing. The report mentions several main types of secure computation based on the method of data concealment:
- Secret Sharing: Involves dividing data into fragments (shares) distributed across multiple computers. Computation is performed through communication between these computers. It is noted for relatively small encrypted data sizes and lower computational requirements but higher communication overhead.
- Homomorphic Encryption: Allows for addition and multiplication operations to be performed directly on encrypted data. While it reduces communication rounds after the data is encrypted and stored, it typically results in larger ciphertext sizes and requires more computational power. Sharing secret keys in collaborative analysis can also be challenging.
- Garbled Circuits: Transforms both the data and the computation function into a circuit composed of gates operating on random labels corresponding to 0 or 1. It involves two parties: one generating the garbled circuit and labels, and another computing using them. Communication rounds are low, but the transfer and computation of garbled circuits and labels can be bottlenecks.
- Specialized (Purpose-Built) Secure Computation: Designed for specific tasks, such as Private Set Intersection (PSI) for finding common elements in multiple datasets without revealing the full datasets. These methods often utilize secret sharing or homomorphic encryption as underlying components and offer high performance for their specific purpose but lack versatility for diverse analyses.
2.2 Usefulness of Secure Computation in Promoting Data Sharing
Secure computation is emphasized as a highly valuable technology for facilitating data sharing. It enables multiple parties to obtain computation results without disclosing their underlying data, making it possible to safely analyze sensitive data (e.g., healthcare, financial, public, trade secrets) without direct sharing. This not only enhances data protection but also strengthens compliance by making it difficult to intercept and misuse data during processing.
Furthermore, because data analysts do not see the raw data, secure computation can foster collaboration even among competitors. For instance, financial institutions can jointly develop fraud detection models using their respective data without revealing individual customer information. By reducing data protection risks, secure computation is expected to promote innovation through increased data utilization.
2.3 Technical Maturity and Reliability of Secure Computation
The report acknowledges that the technical maturity and characteristics vary among different types of secure computation.
- Secret Sharing: While the first methods were proposed in 1988, significant speed improvements have been achieved in both theory and implementation. It is currently a practical option for large-scale data processing (in the order of millions of records) and computationally intensive AI tasks, especially with high-speed communication lines. Common deployments often involve 2-4 computers for practicality in terms of processing speed and server management costs.
- Homomorphic Encryption: A newer technology, with fully homomorphic encryption proposed in 2008. It is noted to be relatively fast for applications like making predictions using existing machine learning models while keeping the input data secret.
- Garbled Circuits: Offer low communication rounds as computation within the garbled circuit does not require interaction. However, the generation and exchange of garbled circuits and labels can be performance bottlenecks.
The document also addresses the reliability of secure computation results. Since data analysts typically cannot view the original data, traditional verification methods are challenging. To ensure trust, the report suggests two main approaches:
- Methods by Implementers: Demonstrating adherence to standards (e.g., ISO/IEC 4922-2:2024), and comparing results of secure computation with plaintext computation where feasible.
- Third-Party Evaluation: Publishing the secure computation program code or undergoing audits by independent bodies, such as audit firms, and disclosing the results.
2.4 Application Examples of Secure Computation
This section introduces specific secure computation services developed or offered by the collaborating entities.
- SeCIHI: A secret sharing-based secure computation cloud service provided by NTT Communications. It features a 3-server client-server architecture. Key features include data concealment through secret sharing, access control based on user roles (data registrant, data analyst), a variety of available analysis methods (basic data processing, statistical operations, statistical testing, regression analysis), and multiple user interfaces (GUI, Jupyter Notebook-like interface with R-like commands, and APIs for external system integration). NTT Communications handles environment setup, maintenance, and operation. On-premises deployment options are also being developed. The service has been used in various fields, including a demonstration experiment with Chiba University Hospital since 2022.
- QueryAhead®: A secret sharing-based secure computation database platform from ZenmuTech. It employs a 2-server client-server architecture. Clients handle data secret sharing, recovery, random number generation, and issuing secure computation requests (queries). Key features include the ability to write secret computation request programs in general-purpose languages like Python and SQL (without requiring specialized knowledge of secure computation or cryptography), granular access control for each user at the table and record level (allowing fine-grained policies for recovery, querying, data insertion, and deletion), and flexible system deployment on both cloud and on-premises environments (server provided as Docker containers, client as a Python module). Use cases include leveraging data from multiple organizations while preserving the confidentiality of each entity's trade secrets, and outsourcing data analysis securely by processing data on a QueryAhead® instance in an external cloud without the need for data restoration.
3. Trends in PETs Including Secure Computation and the Positioning of Secure Computation
This section discusses the broader context of Privacy-Enhancing Technologies (PETs) and the role of securre computation within this landscape.
PETs are defined as "a group of technologies that enable the collection, processing, analysis, and sharing of information while protecting data confidentiality and privacy." The increasing importance of PETs is driven by the growth of data-driven businesses and research, coupled with the necessity to protect personal data. International bodies like OECD and regulatory agencies in the US and UK have issued reports and guidelines on PETs, and they are also gaining attention in Japan within the context of personal information protection law revisions.
Secure computation is identified as a key technology within PETs because it allows for collaborative data analysis while keeping the data confidential. The OECD categorizes PETs broadly into technologies that:
- Process data while concealing it (e.g., secure computation).
- Obfuscate data (e.g., anonymization, pseudonymization).
- Perform distributed analysis (e.g., federated learning).
- Ensure data reliability (e.g., secret sharing).
The report emphasizes that secure computation is particularly strong in protecting data during utilization. Other PETs have limitations: anonymization can affect the accuracy of analysis results or restrict analysis methods; federated learning, while not requiring direct data sharing, necessitates sharing model parameters, offering only limited data protection; and technologies ensuring data reliability often require data to be restored for utilization, thus not protecting it during that phase. Consequently, secure computation is presented as a primary candidate for analyses requiring accuracy comparable to plaintext analysis, such as those involving joining datasets based on unique IDs.
A sidebar clarifies the difference between secure computation and anonymization. While both aim to protect privacy, secure computation allows for accurate analysis on concealed data, whereas anonymization involves modifying the data itself, potentially affecting the accuracy of results. The choice between them, or a combination, depends on the specific privacy goals of the application.
4. Use Cases of Secure Computation
This section provides nine real-world or pilot use cases of secure computation across three themes:
4.1 Primary Collection and Utilization of Personal Data × Secure Computation
- (1) Privacy-Preserving Statistics Project by COVID-19 Contact Tracing System (USA, Secret Sharing): US public health authorities used pre-installed contact tracing systems on smartphones to collect and statistically analyze contact history data during the COVID-19 pandemic. To overcome citizen resistance to personal data collection by government and big tech, they adopted data minimization and secure computation, collecting only contact history with infected individuals in a way that prevented tracing the data back to the provider. This allowed for statistical analysis of infection spread and risk assessment while addressing privacy concerns.
- (2) Inflammatory Bowel Disease (IBD) Observational Study at Chiba University Hospital (Japan, Secret Sharing): Chiba University Hospital conducts observational research on IBD, aiming to improve remission maintenance and quality of life. To gather data on patients' daily lives and medication adherence that is difficult to obtain through consultations, they combined an ePRO (electronic patient-reported outcome) app with secure computation. Patients answered online surveys about their symptoms and concerns, with responses concealed by secure computation, ensuring that Chiba University and NTT Communications could only access statistical results. This enhanced trust and provided more accurate data for research and service improvement.
- (3) Microsoft Edge's Password Leak Detection Feature (Japan, Homomorphic Encryption): Microsoft implemented a feature in its Edge browser to detect and notify users about password leaks. By using homomorphic encryption to conceal users' stored passwords, the browser can compare them against a database of leaked passwords without Microsoft ever seeing the actual password content or the comparison results. This allows users to check for compromises while maintaining their privacy.
4.2 Data Sharing Between Organizations × Secure Computation
- (1) Wage Gap Survey in Boston City (USA, Secret Sharing): The city of Boston, through the Boston Women's Workforce Council (BWWC), has been using secure computation to collect and analyze employee wage data from over 200 companies since 2015, aiming to identify gender and race-based wage disparities. The secure computation system, developed by Boston University, ensures that even BWWC cannot access individual companies' raw data, addressing security and privacy concerns and enabling large-scale data collection for policy and corporate use.
- (2) Inclusive Travel: Effect Measurement of Welfare Budget Efficiency Measures Related to Mobility (Netherlands, Secret Sharing): The provinces of Groningen and Drenthe in the Netherlands conducted a data analysis project to identify attributes of mobility-impaired individuals (elderly, disabled, students) likely to switch from subsidized taxi services to free public transport passes. They used a secure computation system from Roseman Labs to collect and join mobility history data from private taxi companies with attribute data from municipalities, analyzing changes in travel patterns before and after the introduction of free passes. While the initial analysis was affected by the COVID-19 pandemic, the project was recognized for its privacy-preserving innovation.
- (3) Demonstration Experiment of "Secret Cross Statistics" by Japan Airlines (JAL) and NTT Docomo (Japan, Homomorphic Encryption): In 2022, JAL and NTT Docomo conducted a trial using "Secret Cross Statistics." NTT Docomo's mobile location data was filtered to include only JAL passengers, and this data was statistically analyzed to inform measures for preventing flight delays and promoting tourism at destinations. Secure computation was used to match NTT Docomo's user attribute data with JAL's passenger data based on common IDs, allowing JAL to access aggregated passenger flow data without either company being able to identify individuals during the data processing.
4.3 Building Data Collaboration Platforms × Secure Computation
- (1) OSCAR DREAM: Building a Data Analysis Platform for Cancer Clinical Research (Denmark, Secret Sharing): Denmark is using secure computation to build a national data analysis platform to facilitate cancer clinical research by linking diverse registries held by government agencies, municipalities, hospitals, and private companies. This public-private partnership aims to create a one-stop environment for timely, cross-sectional analysis of census data, health statistics, and hospital records. While still under development (as of 2024), the platform is expected to streamline the process for pharmaceutical companies to access and combine multiple registries for research, aiming to position Denmark as a leading test market for cancer drugs and clinical trials.
- (2) MPC4AML: Money Laundering Detection System Development Research (Netherlands, Secret Sharing): ABN Amro Bank and Rabobank in the Netherlands are collaborating on a project called "MPC4AML" to develop a system for sharing risk scores associated with bank accounts to detect money laundering more effectively. They use secure computation to transmit risk scores (derived from transaction history) between the banks in an encrypted form. When funds are transferred from an account flagged as high-risk in one bank to an account in the other, the receiving account is automatically also flagged, without either bank revealing the underlying transaction details or risk assessment logic. This research aims to create an industry-wide infrastructure for enhanced anti-money laundering efforts.
- (3) PATH Finder Advantage: Human Trafficking Investigation Support System for Police (USA, Homomorphic Encryption): DeliverFund, a US non-profit, partnered with Enveil to launch "PATH Finder Advantage," a human trafficking investigation support system. It contains a securely stored, encrypted database of human trafficking-related data (victims, perpetrators, contact information, social media, IP addresses) gathered by DeliverFund from online sources and the dark web. Law enforcement agencies across multiple states can use the system to compare their investigation data (phone numbers, social media, IP addresses) against this database without sharing their data with DeliverFund or revealing it to third parties, enabling them to identify connections to past trafficking cases.
5. Future Prospects of Secure Computation
This section discusses anticipated challenges and countermeasures as secure computation technology advances and becomes more widespread.
5.1 Development and Popularization Trends of Secure Computation Technology
While research on secure computation has been active since the 1980s, practical implementation was long hindered by the overhead of encryption processes and the difficulty of applying standard efficient algorithms, resulting in significant slowdowns compared to regular computation. However, recent advancements in faster secure computation methods and improvements in computing and network performance have dramatically reduced the processing speed gap. The ability to perform a wider range of computations, including basic statistics and complex machine learning, through the development of various algorithms has also contributed to its increased practicality. The focus of current research is shifting from solely improving speed and computational efficiency to developing practical features for social implementation.
In addition to technical progress, societal trends, particularly the strengthening of privacy protection regulations globally (starting with GDPR), have driven government bodies and international organizations to promote PETs, including secure computation. This has led to an increase in businesses related to secure computation, such as providing foundational technologies and specialized software. While early adoption has been prominent in sectors dealing with highly sensitive data like finance, healthcare, and the public sector, the technology is expected to gradually expand to other industries facing data privacy, security, and digital ethics challenges.
5.2 Anticipated Challenges and Countermeasures After Popularization
Fragmentation (Siloing) with External Systems
As the use of secure computation grows, there is an expectation that different companies, industries, and countries will adopt secure computation systems based on various methods. Currently, these systems, developed by different providers, operate on distinct algorithms, making it impossible to directly process data encrypted with one algorithm using a system based on another. This poses a challenge for interoperability and data sharing between different secure computation systems.
The difficulty in interoperation extends beyond encrypted data. Even the decrypted results of secure computations lack standardized data types across different providers. This necessitates custom data conversion functionalities for linking results between systems, increasing design and development costs.
This fragmentation of encrypted data and processing results into isolated systems (silos) is a significant concern for the widespread adoption and benefit of secure computation. To prevent this, functionalities that enable secure and easy interconnection with general plaintext systems and external secure computation systems are deemed necessary. SIP3 is working on establishing such features, referred to as interoperability functions, which include methods for safely converting between different encryption schemes, data transfer methods, and external APIs for system integration.
Overview of Interoperability Functions
The document presents a conceptual diagram illustrating the interoperability of secure computation systems.
By utilizing interoperability functions, a secure computation system can exchange data with external general systems (plaintext systems) and other secure computation systems, as well as share analysis results from multiple systems.
Four potential approaches for realizing interoperability are mentioned:
- Loosely Coupled Method ①: Data analysts process data on individual secure computation/plaintext systems and integrate the results in their own environment. This is excluded from the scope of interoperability functions as the systems are independently used and not interconnected.
- Loosely Coupled Method ②: Data analysts send analysis queries to an interoperability function API, which then forwards the queries to individual systems. The results from each system are integrated by the interoperability function and returned to the data analyst. In this method, raw data remains isolated at the system level, and only results are re-computed in a common area.
- Loosely Coupled Method ③: Data stored in one secure computation system is processed, decrypted, re-encrypted for the target system, and then secure computation is performed. The data is temporarily decrypted before being re-entered into a different system, and the final results are returned to the data analyst's environment.
- Tightly Coupled Method: The raw data stored in a secure computation system is transferred to the target system in its encrypted form, with a possible change in the encryption method, and then secure computation is performed. The encrypted data is re-entered into a different system after the encryption method might have been adapted, and the results are returned to the data analyst's environment.
The report states that the feasible functionalities and detailed implementation methods for each approach are under consideration as of the writing of the intermediate report and will be further elaborated in the final version of the guidelines in fiscal year 2025.
Anticipated Use Case of Interoperability Functions
A hypothetical use case involving two different medical institutions (A and B) performing cross-sectional analysis of medical data using loosely coupled method ② is presented. It assumes that both institutions have their medical data stored and utilized within their own distinct secure computation systems (System A and System B).
Without interoperability, conducting a joint analysis would require medical institutions A and B to establish connections to each other's secure computation systems, send analysis queries to each system separately, and then manually integrate the results. However, if their secure computation systems are linked via interoperability functions, they would only need to interact with the interoperability layer to send a single analysis query that is then distributed to both systems, receiving a consolidated result.
The benefit of loosely coupled method ② is highlighted as the potential reduction in the effort required for security evaluation of systems when initiating data utilization with other organizations. This is particularly relevant for healthcare data, where ethical guidelines necessitate security assessments of the systems used. By utilizing systems with pre-implemented interoperability functions, the need for new system evaluations for each collaboration could be eliminated, streamlining the ethical review process.
6. Conclusion
The overview document of the "Guidelines for Data Utilization Using Secure Computation" provides a comprehensive introduction to the concept, benefits, technical aspects, and applications of secure computation. It underscores the technology's potential to revolutionize data sharing and utilization by addressing critical privacy and security concerns. The report highlights the ongoing development and increasing maturity of secure computation, along with its growing adoption across various sectors. Furthermore, it proactively identifies the challenge of system fragmentation as a key hurdle to widespread interoperability and outlines potential solutions through the development of interoperability functions. The intermediate report sets the stage for more detailed discussions in subsequent sections of the guidelines and signals a significant push towards the practical implementation and standardization of secure computation technologies for a more secure and collaborative data-driven society.
Please follow us to read more about Finance & FinTech in Japan, like hundreds of readers do every day. We invite you to also register for our short weekly digest, the “Japan FinTech Observer”, on LinkedIn, or directly here on the platform.
We also provide a daily short-form Japan FinTech Observer news podcast, available via its Podcast Page. Our global Finance & FinTech Podcast, “eXponential Finance” is available through its own LinkedIn newsletter, or via its Podcast Page.
Should you live in Tokyo, or just pass through, please also join our meetup. In any case, our YouTube channel and LinkedIn page are there for you as well.