Data Governance Framework

Julian Cohen
5 min readOct 19, 2023

--

https://www.simplilearn.com/ice9/free_resources_article_thumb/What_is_Data_Types_of_Data_and_How_To_Analyze_Data.jpg

Data Governance is the security team’s responsibility for understanding, identifying, inventorying, classifying, discovering, and controlling data within an organization. This framework is designed to help you build a program to satisfy that responsibility to the level required by your organization and business.

Your data governance framework must include technology and policy. Technology alone will not be able understand and govern your data, you’ll need to augment it with people and process. Policy alone will not protect your data against adversaries.

A comprehensive data governance program can prevent data breaches, detect vulnerabilities that could lead to data breaches, and be an intelligence source during investigations and incidents involving your data.

Data Classification

First you need to set some rules for how data is classified in your organization. Classification requirements and criteria will be different for every organization, so it’s important to start with a complete understanding of your security, regulatory compliance, and legal requirements (make sure these teams are involved in writing your classification definitions to ensure they are accurate and complete).

Below is an example of a simple classification system for an average B2B SaaS company with GLBA or SOX compliance requirements.

Public — This is information that is publicly available on the internet, such as public datasets and public research. There are generally no security constraints on how this data is stored or accessed.

For Approved Parties — This is information that has been prepared for customers, partners, and other external parties that are required for the business to operate, such as sales decks, pricing information, success metrics, etc. This information may be required to be shared under NDA. In some cases, this information may be treated as Public. There are generally little constraints on how this data is stored or accessed.

Internal — This is information that is company property that should not be exposed externally, such as intellectual property, business intelligence, customer lists, company meeting minutes, company policies and procedures, Material Nonpublic Information, etc. This information is meant to be confidential to current company employees. Typically, this data should only be stored in approved secure locations that are encrypted at rest and in transit with access control.

Confidential — This is information that the company has been entrusted with to keep confidential, such as customer data, etc. This information may be the most important to keep secure, as unauthorized access to this data may cause a reportable breach. Typically, this information must only be stored in approved and cataloged locations that are encrypted at rest and in transit with role-based access control.

In addition to these basic categories, I also recommend having special categories such as:

  • Credentials. Credentials may be Confidential, but also may require additional constraints such as must be stored only in approved locations (such as Secrets Management and Password Management).
  • Employee Personal Information. Employee Personal Information may be Internal, but also may require additional constraints such as compensation information may only be stored and accessed by the HR team.
  • Vulnerability Information. Technical vulnerability information may be Confidential, but also may require additional constraints, such as should be stored only in approved locations never be shared with anyone outside of the company without additional approval from the CISO.
  • Financial Information. Access to certain sensitive financial information may require additional approval from the CFO.
  • Privileged Legal Information. Information and communications that are privileged under attorney-client privilege may require additional constraints and access may require additional approval from the GC.
  • Compliance and Audit Reports. Compliance and audit information may be For Approved Parties, but also may require additional constraints, such as requiring an NDA be signed, be viewed through a secure portal, and/or be watermarked.

Data Dictionary

Your data dictionary is asset management for your data. It’s your authoritative catalog of your datastores, with their owners, their physical and logical locations, what data is stored inside them, how that data is classified, who/what has access to them, and any other relevant business, security, and risk considerations.

There are many commercial data dictionary, data catalog, and data management solutions that automatically discover and populate this information.

Datastore Vulnerability Scanning

Your datastores are complex information systems that may have all kinds of vulnerabilities and issues that could put your data at risk. You must have a strategy and procedure that uses technology and process to discover and remediate datastore vulnerabilities such as cloud datastore misconfigurations, improper access controls, and more.

Access Control

You must have strong identity and access management to your datastores in order to prevent unauthorized access, detect and respond to malicious access, and audit and improve access controls. Start with separating human identities and machines identities and creating least privilege roles for each employee type and application type you have. Remember to balance business requirements and business risks with security risks in creating least privilege roles!

Use SSO and Secrets Management to prevent password reuse and enforce credential rotation. Enforce 2FA for human identities and short-lived tokens for machine identities. Use BeyondCorp/ZeroTrust to enforce that human identities are only used from approved or trusted devices and use machine assumable roles to enforce that machine identities are only used from specific machine instances.

Continuous Monitoring

Data Discovery

You will never have a fully complete and accurate understanding of your organization and its infrastructure, because it’s constantly changing. So, your process and techniques for identification and inventorying your data and datastores need to be continuous. You may want to use continuous discovery technology or processes to ensure that you always have most complete and accurate understanding you can.

You may also want to create process and automation around creating new datastores. Giving the organization an easy, secure way to create new datastores and store new data makes it more likely for the right decisions to be made by different teams as they introduce new data and infrastructure.

Access Control

As your infrastructure and roles grow in size and complexity, there are going to be identity and access management vulnerabilities, such as overprivileged roles, temporary access that became permanent, improperly offboarded identities, cross-environment access (such as dev to prod), and more. You must constantly be evaluating and improving access control technology and processes to ensure that these issues are being remediated. The more centralized your identity and access control, the easier it is to audit and remediate issues.

Shadow IT and Shadow Engineering

Any shadow IT or shadow engineering may result in uninventoried datastores and unclassified data, which makes it much harder to secure and control according to your data governance program. You may want to have technology and processes to uncover shadow IT and shadow engineering and policies to deter and disincentivize shadow IT and shadow engineering.

Adjacent Programs

Don’t forget that your data governance program is going to inform other security programs and that other business processes and security programs are going to be informed by your Data Governance program. Examples of these programs include, but are not limited to:

Data Loss Prevention

Your data classifications and your data dictionary may assist in developing an effective DLP program.

Third Party Risk Management

Your data governance policies and data classifications are going to dictate which data can be stored with which third-parties, which third parties can have access to which systems, and more.

Privacy

Depending on your business and your privacy obligations, a large part of your data governance policies and procedures may include privacy compliance. Including data discovery and classification in third party applications, data copy/deletion requests, and data minimization.

--

--

Julian Cohen

Risk philosopher. CISO. Team and program builder. Ex-vulnerability researcher. Ex-CTF organizer and competitor.