INFORMATION MANAGEMENT AND DATA SECURITY
Elasticity, multi-tenancy, new physical and logical architectures, and abstracted controls require new data security strategies.
Information management and data security in the cloud era demand both new strategies and technical architectures.
Fortunately not only do users have the tools and techniques needed, but the cloud transition even creates opportunities to better secure data in our traditional infrastructure.
The authors recommend using a Data Security Lifecycle (explored below) for evaluating and defining cloud data security strategy. This should be layered with clear information governance policies, and then enforced by key technologies such as encryption and specialized monitoring tools.
This domain includes three sections:
(A) Section 1 provides background material on cloud information (storage) architectures.
(B) Section 2 includes best practices for information management, including the Data Security Lifecycle.
(C) Section 3 details specific data security controls, and when to use them.
5.1 Cloud Information Architectures
Cloud information architectures are as diverse as the cloud architectures themselves. While this section can’t possibly cover all potential permutations, there are certain consistent architectures within most cloud services.
5.1.1 Infrastructure as a Service
IaaS, for public or private cloud, generally includes the following storage options:
(I) Raw storage : This includes the physical media where data is stored. May be mapped for direct access in certain private cloud configurations.
(II) Volume storage : This includes volumes attached to IaaS instances, typically as a virtual hard drive. Volumes often use data dispersion to support resiliency and security.
(III) Object storage : Object storage is sometimes referred to as file storage. Rather than a virtual hard drive, object storage is more like a file share accessed via API’s23 or web interface.
(IV) Content Delivery Network : Content is stored in object storage, which is then distributed to multiple geographically distributed nodes to improve Internet consumption speeds.
5.1.2 Platform as a Service
PaaS both provides and relies on a very wide range of storage options.
PaaS may provide:
(A) Database as a Service : A multitenant database architecture that is directly consumable as a service. Users consume the database via APIs or direct SQL24 calls, depending on the offering. Each customer’s data is segregated and isolated from other tenants. Databases may be relational, flat, or any other common structure.
(B) Hadoop/MapReduce/Big Data as a Service : Big Data is data whose large scale, broad distribution, heterogeneity, and currency/timeliness require the use of new technical architectures and analytics. Hadoop and other Big Data applications may be offered as a cloud platform. Data is typically stored in Object Storage or another distributed file system. Data typically needs to be close to the processing environment, and may be moved temporally as needed for processing.
(C) Application storage : Application storage includes any storage options built into a PaaS application platform and consumable via API’s that doesn’t fall into other storage categories.
PaaS may consume:
(A) Databases : Information and content may be directly stored in the database (as text or binary objects) or as files referenced by the database. The database itself may be a collection of IaaS instances sharing common back-end storage.
(B) Object/File Storage : Files or other data are stored in object storage, but only accessed via the PaaS API.
(C) Volume Storage : Data may be stored in IaaS volumes attached to instances dedicated to providing the PaaS service.
(D) Other : These are the most common storage models, but this is a dynamic area and other options may be available.
5.1.3 Software as a Service
As with PaaS, SaaS uses a very wide range of storage and consumption models. SaaS storage is always accessed via a web-based user interface or client/server application.
If the storage is accessible via API then it’s considered PaaS. Many SaaS providers also offer these PaaS APIs.
API - Application Program Interface
SQL - Structural Query Language is programming language designed for managing data
SaaS may provide:
(A) Information Storage and Management : Data is entered into the system via the web interface and stored within the SaaS application (usually a back-end database). Some SaaS services offer data set upload options, or PaaS API’s.
(B) Content/File Storage : File-based content is stored within the SaaS application (e.g., reports, image files, documents) and made accessible via the web-based user interface.
SaaS may consume:
(A) Databases : Like PaaS, a large number of SaaS services rely on database back-ends, even for file storage.
(B) Object/File Storage : Files or other data are stored in object storage, but only accessed via the SaaS application.
(C) Volume Storage : Data may be stored in IaaS volumes attached to instances dedicated to providing the SaaS service.
5.2 Data (Information) Dispersion
Data (Information) Dispersion is a technique that is commonly used to improve data security, but without the use of encryption mechanisms.
These sorts of algorithms (IDA for short) are capable of providing high availability and assurance for data stored in the cloud, by means of data fragmentation, and are common in many cloud platforms.
In a fragmentation scheme, a file f is split into n fragments; all of these are signed and distributed to n remote servers. The user then can reconstruct f by accessing m arbitrarily chosen fragments. The fragmentation mechanism can also be used for storing long-lived data in the cloud with high assurance.
When fragmentation is used along with encryption, data security is enhanced: an adversary has to compromise m cloud nodes in order to retrieve m fragments of the file f, and then has to break the encryption mechanism being used.
IDA - Intrusion Detection Algorithms
5.3 Information Management
Before we can discuss specific data security controls, we need a model to understand and manage our information.
Information management includes the processes and policies for both understanding how your information is used, and governing that usage.
In the data security section, specific technical controls and recommendations are discussed to monitor and enforce this governance.
5.4 The Data Security Lifecycle
The Data Security Lifecycle is different from Information Lifecycle Management, reflecting the different needs of the security audience.
The lifecycle includes six phases from creation to destruction. Although it is shown as a linear progression, once created, data can bounce between phases without restriction, and may not pass through all stages (for example, not all data is eventually destroyed).
1. Create : Creation is the generation of new digital content, or the alteration/updating/modifying of existing content.
2. Store : Storing is the act committing the digital data to some sort of storage repository and typically occurs nearly simultaneously with creation.
3. Use : Data is viewed, processed, or otherwise used in some sort of activity, not including modification.
4. Share : Information is made accessible to others, such as between users, to customers, and to partners.
5. Archive : Data leaves active use and enters long-term storage.
6. Destroy : Data is permanently destroyed using physical or digital means (e.g., cryptoshredding).
5.4.1 Locations and Access
The lifecycle represents the phases information passes through but doesn’t address its location or how it is accessed.
Locations
This can be illustrated by thinking of the lifecycle not as a single, linear operation, but as a series of smaller lifecycles running in different operating environments. At nearly any phase data can move into, out of, and between these environments.
Due to all the potential regulatory, contractual, and other jurisdictional issues it is extremely important to understand both the logical and physical locations of data.
Access
When users know where the data lives and how it moves, they need to know who is accessing it and how. There are two factors here:
1. Who accesses the data?
2. How can they access it (device & channel)?
Data today is accessed using a variety of different devices. These devices have different security characteristics and may use different applications or clients.
5.4.2 Functions, Actors, and Controls
The next step identifies the functions that can be performed with the data, by a given actor (person or system) and a particular location.
Functions
There are three things we can do with a given datum:
(I) Access : View/access the data, including creating, copying, file transfers, dissemination, and other exchanges of information.
(II) Process : Perform a transaction on the data:
update it; use it in a business processing transaction, etc.
(III) Store : Hold the data (in a file, database, etc.).
An actor (person, application, or system/process, as opposed to the access device) performs each function in a location.
Controls
A control restricts a list of possible actions down to allowed actions.
The table below shows one way to list the possibilities, which the user then maps to controls.
5.5 Information Governance
Information governance includes the policies and procedures for managing information usage. It includes the following key features:
(A) Information Classification : High-level descriptions of important information categories. Unlike with data classification the goal isn’t to label every piece of data in the organization, but rather to define high-level categories like “regulated” and “trade secret” to determine which security controls may apply.
(B) Information Management Policies : Policies to define what activities are allowed for different information types.
(C) Location and Jurisdictional Polices : Where data may be geographically located, which also has important legal and regulatory ramifications.
(D) Authorizations : Define which types of employees/users are allowed to access which types of information.
(E) Ownership : Who is ultimately responsible for the information.
(F) Custodianship : Who is responsible for managing the information, at the bequest of the owner.
5.6 Data Security
Data security includes the specific controls and technologies used to enforce information governance.
This has been broken out into three sections :
(I) to cover detection (and prevention) of data migrating to the cloud
(II) protecting data in transit to the cloud and between different providers/environments, and
(III) protecting data once it’s within the cloud.
5.6.1 Detecting and Preventing Data Migrations to the Cloud:
A common challenge organizations face with the cloud is managing data. Many organizations report individuals or business units moving often sensitive data to cloud services without the approval or even notification of IT or security.
Aside from traditional data security controls (like access controls or encryption), there are two other steps to help manage unapproved data moving to cloud services:
1. Monitor for large internal data migrations with Database Activity Monitoring (DAM) and File Activity Monitoring (FAM).
2. Monitor for data moving to the cloud with URL filters and Data Loss Prevention.
DAM - Database Activity Monitoring
FAM - File Activity Monitoring
Internal Data Migrations
Before data can move to the cloud it needs to be pulled from its existing repository.
Database Activity Monitoring can detect when an administrator or other user pulls a large data set or replicates a database, which could indicate a migration.
File Activity Monitoring provides similar protection for file repositories, such as file shares.
Movement to the Cloud
A combination of URL filtering (web content security gateways) and Data Loss Prevention (DLP) can detect data moving from the enterprise into the cloud.
URL filtering allows you to monitor (and prevent) users connecting to cloud services.
Since the administrative interfaces for these services typically use different addresses than the consumption side, the user can distinguish between someone accessing an administrative console versus a user accessing an application already hosted with the provider.
Look for a tool that offers a cloud services list and keeps it up to date, as opposed to one that requires creating a custom category, and the user managing the destination addresses.
For greater granularity, use Data Loss Prevention.
DLP tools look at the actual data/content being transmitted, not just the destination.
Thus the user can generate alerts (or block) based on the classification of the data.
For example, the user can allow corporate private data to go to an approved cloud service but block the same content from migrating to an unapproved service.
The insertion point of the DLP solution can determine how successfully data leakage can be detected.
For example, availability of cloud solutions to various users (e.g., employees, vendors, customers) outside of the corporate network environment avoids or nullifies any DLP solutions if they are inserted at the corporate boundary.
5.6.2 Protecting Data Moving To (And Within) the Cloud
In both public and private cloud deployments, and throughout the different service models, it’s important to protect data in transit.
This includes:
(A) Data moving from traditional infrastructure to cloud providers, including public/private, internal/external and other permutations.
(B) Data moving between cloud providers.
(C) Data moving between instances (or other components) within a given cloud.
There are three options (or order of preference):
(1) Client/Application Encryption : Data is encrypted on the endpoint or server before being sent across the network or is already stored in a suitable encrypted format. This includes local client (agent-based) encryption (e.g., for stored files) or encryption integrated in applications.
(2) Link/Network Encryption : Standard network encryption techniques including SSL, VPNs, and SSH. Can be hardware or software. End to end is preferable but may not be viable in all architectures.
(3) Proxy-Based Encryption : Data is transmitted to a proxy appliance or server, which encrypts before sending further on the network. Often a preferred option for integrating into legacy applications but is not generally recommended.
5.6.3 Protecting Data in the Cloud
The following are some of the more useful technologies and best practices for securing data within various cloud models.
5.6.3.1 Content Discovery
Content discovery includes the tools and processes to identify sensitive information in storage.
It allows the organization to define policies based on information type, structure, or classification and then scans stored data using advanced content analysis techniques to identify locations and policy violations.
Content discovery is normally a feature of Data Loss Prevention tools; for databases, it is sometimes available in Database Activity Monitoring products.
Scanning can be via accessing file shares or a local agent running on an operating system.
The tool must be “cloud aware” and capable of working within your cloud environment (e.g., able to scan object storage). Content discovery may also be available as a managed service.
5.6.3.2 IaaS Encryption
5.6.3.2.1 Volume Storage Encryption :
Volume encryption protects from the following risks:
(I) Protects volumes from snapshot cloning/exposure
(II) Protects volumes from being explored by the cloud provider (and private cloud admins)
(III) Protects volumes from being exposed by physical loss of drives (more for compliance than a real-world security issue)
IaaS volumes can be encrypted using three methods:
(I) Instance-managed encryption : The encryption engine runs within the instance, and the key is stored in the volume but protected by a passphrase or keypair.
(II) Externally managed encryption : The encryption engine runs in the instance, but the keys are managed externally and issued to the instance on request.
(III) Proxy encryption : In this model you connect the volume to a special instance or appliance/software, and then connect your instance to the encryption instance. The proxy handles all crypto operations and may keep keys either onboard or external.
5.6.3.2.2 Object Storage Encryption :
Object storage encryption protects from many of the same risks as volume storage.
Since object storage is more often exposed to public networks, it also allows the user to implement Virtual Private Storage.
Like a VPN, a VPS allows use of a public shared infrastructure while still protecting data, since only those with the encryption keys can read the data even if it is otherwise exposed.
(A) File/Folder encryption and Enterprise Digital Rights Management: Use standard file/folder encryption tools or EDRM to encrypt the data before placing in object storage.
(B) Client/Application encryption : When object storage is used as the back-end for an application (including mobile applications), encrypt the data using an encryption engine embedded in the application or client.
(C) Proxy encryption : Data passes through an encryption proxy before being sent to object storage.
VPS - Virtual Private Storage
5.6.3.3 PaaS Encryption
Since PaaS is so diverse, the following list may not cover all potential options:
(I) Client/application encryption : Data is encrypted in the PaaS application or the client accessing the platform.
(II) Database encryption : Data is encrypted in the database using encryption built in and supported by the database platform.
(III) Proxy encryption : Data passes through an encryption proxy before being sent to the platform.
(IV) Other : Additional options may include API’s built into the platform, external encryption services, and other variations.
5.3.4.4 SaaS Encryption
SaaS providers may use any of the options previously discussed. It is recommended to use per-customer keys when possible to better enforce multi-tenancy isolation.
The following options are for SaaS consumers:
(A) Provider-managed encryption : Data is encrypted in the SaaS application and generally managed by the provider.
(B) Proxy encryption : Data passes through an encryption proxy before being sent to the SaaS application.
Encryption operations should use whatever encryption method is most appropriate, which may include shared keys or public/private keypairs and an extensive PKI/PKO (Public Key Infrastructure/Operations) structure.
5.3.5 Data Loss Prevention
Products that, based on central policies, identify, monitor, and protect data at rest, in motion, and in use, through deep content analysis.
DLP can provide options for how data found violation of policy is to be handled.
Data can be blocked (stopping a workflow) or allowed to proceed after remediation by encryption using methods such as DRM, ZIP, or OpenPGP.
DLP is typically used for content discovery and to monitor data in motion using the following options:
(A) Dedicated appliance/server : Standard hardware placed at a network chokepoint between the cloud environment and the rest of the network/Internet or within different cloud segments.
(B) Virtual appliance
(C) Endpoint agent
(D) Hypervisor-agent : The DLP agent is embedded or accessed at the hypervisor level, as opposed to running in the instance.
(E) DLP SaaS : DLP is integrated into a cloud service (e.g., hosted email) or offered as a standalone service (typically content discovery).
5.3.6 Database and File Activity Monitoring
Database Activity Monitoring (DAM) is defined as:
Database Activity Monitors capture and record, at a minimum, all Structured Query Language (SQL) activity in real time or near real time, including database administrator activity, across multiple database platforms; and can generate alerts on policy violations.
DAM supports near real time monitoring of database activity and alerts based on policy violations, such as SQL injection attacks or an administrator replicating the database without approval.
DAM tools for cloud environments are typically agent-based connecting to a central collection server (which is typically virtualized).
It is used with dedicated database instances for a single customer, although in the future may be available for PaaS.
File Activity Monitoring (FAM) is defined as:
Products that monitor and record all activity within designated file repositories at the user level, and generate alerts on policy violations.
FAM for cloud requires use of an endpoint agent or placing a physical appliance between the cloud storage and the cloud consumers.
5.3.7 Application Security
A large percentage of data exposures are the result of attacks at the application layer, particularly for web applications.
5.3.8 Privacy Preserving Storage
Almost all cloud-based storage systems require some authentication of participants (cloud user and/or CSP) to establish trust relations, either for only one endpoint of communication or for both.
Although cryptographic certificates can offer sufficient security for many of these purposes, they do not typically cater to privacy because they are bound to the identity of a real person (cloud user).
Any usage of such a certificate exposes the identity of the holder to the party requesting authentication.
Example : There are many scenarios (e.g., storage of Electronic Health Records) where the use of such certificates unnecessarily reveals the identity of their holder.
Over the past 10-15 years, a number of technologies have been developed to build systems in a way that they can be trusted, like normal cryptographic certificates, while at the same time protecting the privacy of their holder (i.e., hiding the real holder’s identity).
Such attribute-based credentials are issued just like ordinary cryptographic credentials (e.g., X.509 credentials) using a digital (secret) signature key.
However, attribute-based credentials (ABCs) allow their holder to transform them into a new credential that contains only a subset of the attributes contained in the original credential.
Still, these transformed credentials can be verified just like ordinary cryptographic credentials (using the public verification key of the issuer) and offer the same strong security.
5.3.9 Digital Rights Management (DRM)
At its core, Digital Rights Management encrypts content, and then applies a series of rights.
Rights can be as simple as preventing copying, or as complex as specifying group or user-based restrictions on activities like cutting and pasting, emailing, changing the content, etc.
Any application or system that works with DRM protected data must be able to interpret and implement the rights, which typically also means integrating with the key management system.
There are two broad categories of Digital Rights Management:
(A) Consumer DRM : is used to protect broadly distributed content like audio, video, and electronic books destined for a mass audience. There are a variety of different technologies and standards, and the emphasis is on one-way distribution.
Example : Consumer DRM offers good protection for distributing content to customers but does not have a good track record with most technologies being cracked at some point.
(B) Enterprise DRM : is used to protect the content of an organization internally and with business partners. The emphasis is on more complex rights, policies, and integration within business environments and particularly with the corporate Directory Service.
Example : Enterprise DRM can secure content stored in the cloud well but requires deep infrastructure integration. It’s most useful for document based content management and distribution.
5.4 Recommendations
(1) Understand the cloud storage architecture in use, which will help determine security risk and potential controls.
(2) Choose storage with data dispersion when available.
(3) Use the Data Security Lifecycle to identify security exposures and determine the most appropriate controls.
(4) Monitor key internal databases and file repositories with DAM and FAM to identify large data migrations, which could indicate data migrating to the cloud.
(5) Monitor employee Internet access with URL filtering and/or DLP tools to identify sensitive data moving to the cloud. Select tools that include predefined categories for cloud services. Consider using filtering to block unapproved activity.
(6) Encrypt all sensitive data moving to or within the cloud at the network layer, or at nodes before network transmission. This includes all service and deployment models.
(7) When using any data encryption, pay particular attention to key management (see Domain 11).
(8) Use content discovery to scan cloud storage and identify exposed sensitive data.
(9) Encrypt sensitive volumes in IaaS to limit exposure due to snapshots or unapproved administrator access. The specific technique will vary depending on operational needs.
(10) Encrypt sensitive data in object storage, usually with file/folder or client/agent encryption.
(11) Encrypt sensitive data in PaaS applications and storage. Application-level encryption is often the preferred option, especially since few cloud databases support native encryption.
(12) When using application encryption, keys should be stored external to the application whenever possible.
(13) If encryption is needed for SaaS, try to identify a provider that offers native encryption. Use proxy encryption if that isn’t available and /or trust levels must be assured.
(14) Use DLP to identify sensitive data leaking from cloud deployments. It is typically only available for IaaS, and may not be viable for all public cloud providers.
(15) Monitor sensitive databases with DAM and generate alerts on security policy violations. Use a cloud-aware tool.
(16) Consider privacy preserving storage when offering infrastructure or applications where normal access could reveal sensitive user information.
(17) Remember that most large data security breaches are the result of poor application security.
(18) Cloud providers should not only follow these practices, but expose data security tools and options to their customers.
(19) Removal of data from a cloud vendor either due to expiry of contract or any other reason should be covered in detail while setting up the SLA. This should cover deletion of user accounts, migration or deletion of data from primary / redundant storage, transfer of keys, etc.
5.5 Requirements
(1) Use the Data Security Lifecycle to identify security exposures and determine the most appropriate controls.
(2) Due to all the potential regulatory, contractual, and other jurisdictional issues it is extremely important to understand both the logical and physical locations of data.
(3) Monitor employee Internet access with URL filtering and/or DLP tools to identify sensitive data moving to the cloud.
(4) Encrypt all sensitive data moving to or within the cloud at the network layer, or at nodes before network transmission.
(5) Encrypt sensitive volumes in IaaS to limit exposure due to snapshots or unapproved administrator access.
(6) Encrypt sensitive data in PaaS applications and storage.
--= | End |=--
No comments:
Post a Comment