INCIDENT RESPONSE
Incident Response (IR) is one of the cornerstones of information security management.
Even the most diligent planning, implementation, and execution of preventive security controls cannot completely eliminate the possibility of an attack on the information assets.
One of the central questions for organizations moving into the cloud must therefore be: what must be done to enable efficient and effective handling of security incidents that involve resources in the cloud?
Cloud computing does not necessitate a new conceptual framework for Incident Response; rather it requires that the organization appropriately maps its extant IR programs, processes, and tools to the specific operating environment it embraces.
This is consistent with the guidance found throughout this document; a gap analysis of the controls that encompass organizations’ IR function should be carried out in a similar fashion.
This domain seeks to identify those gaps pertinent to IR that are created by the unique characteristics of cloud computing.
Security professionals may use this as a reference when developing response plans and conducting other activities during the preparation phase of the IR lifecycle.
To understand the challenges cloud computing poses to incident handling, we must examine, which challenges the special characteristics of cloud computing and the various deployment and service models pose for incident handling.
This domain is organized in accord with the commonly accepted Incident Response Lifecycle as described in the National Institute of Standards and Technology Computer Security Incident Handling Guide (NIST 800-61) [1].
After establishing the characteristics of cloud computing that impact IR most directly, each subsequent section addresses a phase of the lifecycle and explores the potential considerations for responders.
This domain will address the following topics:
(I) Cloud computing impact on Incident Response
(II) Incident Response Lifecycle
(III) Forensic accountability
9.1 Cloud Computing Characteristics that Impact Incident Response
Although cloud computing brings change on many levels, certain characteristics [2] of cloud computing bear more direct challenges to IR activities than others [3].
First, the on demand self-service nature of cloud computing environments means that a cloud customer may find it hard or even impossible to receive the required co-operation from their cloud service provider (CSP) when handling a security incident.
Depending on the service and deployment models used, interaction with the IR function at the CSP will vary.
Indeed, the extent to which security incident detection, analysis, containment, and recovery capabilities have been engineered into the service offering are key questions for provider and customer to address.
Second, the resource pooling practiced by cloud services, in addition to the rapid elasticity offered by cloud infrastructures, may dramatically complicate the IR process, especially the forensic activities carried out as part of the incident analysis.
Forensics has to be carried out in a highly dynamic environment, which challenges basic forensic necessities [4] such as establishing the scope of an incident, the collection and attribution of data, preserving the semantic integrity of that data, and maintaining the stability of evidence overall.
These problems are exacerbated when cloud customers attempt to carry out forensic activities, since they operate in a non-transparent environment (which underscores the necessity of support by the cloud provider as mentioned above).
Third, resource pooling as practiced by cloud services causes privacy concerns for co-tenants regarding the collection and analysis of telemetry and artifacts associated with an incident (e.g. logging, netflow data, memory, machine images, and storage, etc.) without compromising the privacy of co-tenants.
This is a technical challenge that must be addressed primarily by the provider. It is up to the cloud customers to ensure that their cloud service provider has appropriate collection and data separation steps and can provide the requisite incident-handling support.
Fourth, despite not being described as an essential cloud characteristic, cloud computing may lead to data crossing geographic or jurisdictional boundaries without the explicit knowledge of this fact by the cloud customer.
The ensuing legal and regulatory implications may adversely affect the incident handling process by placing limitations on what may or may not be done and/or prescribing what must or must not be done during an incident across all phases of the lifecycle [5].
It is advisable that an organization includes representatives from its legal department on the Incident Response team to provide guidance on these issues.
Cloud computing also presents opportunities for incident responders. Cloud continuous monitoring systems can reduce the time it takes to undertake an incident handling exercise or deliver an enhanced response to an incident.
Virtualization technologies, and the elasticity inherent in cloud computing platforms, may allow for more efficient and effective containment and recovery, often with less service interruption than might typically be experienced with more traditional data center technologies.
Also, investigation of incidents may be easier in some respects, as virtual machines can easily be moved into lab environments where runtime analysis can be conducted and forensic images taken and examined.
9.2 The Cloud Architecture Security Model as a Reference
Using the architectural framework and security controls review advocated in Domain 1 (see Cloud Reference Model Figure 1.5.2a) can be valuable in identifying what technical and process components are owned by which organization and at which level of the “stack.”
Cloud service models (IaaS, PaaS, SaaS) differ appreciably in the amount of visibility and control a customer has to the underlying IT systems and other infrastructure that deliver the computing environment.
This has implications for all phases of Incident Response as it does with all other domains in this guidance document.
For instance, in a SaaS solution, response activities will likely reside almost entirely with the CSP, whereas in IaaS, a greater degree of responsibility and capability for detecting and responding to security incidents may reside with the
customer.
However, even in IaaS there are significant dependencies on the CSP.
Data from physical hosts, network devices, shared services, security devices like firewalls and any management backplane systems must be delivered by the CSP.
Some providers are already provisioning the capability to deliver this telemetry to their customers and managed security service providers are advertising cloud-based solutions to receive and process this data.
Given the complexities, the Security Control Model described in Domain 1 (Figure 1.5.1c), and the activities an organization performs to map security controls to your particular cloud deployment should inform IR planning and vice versa.
Traditionally, controls for IR have concerned themselves more narrowly with higher-level organizational requirements; however, security professionals must take a more holistic view in order to be truly effective.
Those responsible for IR should be fully integrated into the selection, purchase, and deployment of any technical security control that may directly, or even indirectly, affect response. At a minimum, this process may help in mapping of roles/responsibilities during each phase of the IR lifecycle.
Cloud deployment models (public, private, hybrid, community) are also considerations when reviewing IR capabilities in a cloud deployment; the ease of gaining access to IR data varies for each deployment model.
It should be self-evident that the same continuum of control/responsibility exists here as well. In this domain, the primary concern is with the more public end of the continuum.
The authors assume that the more private the cloud, the more control the user will have to develop the appropriate security controls or have those controls delivered by a provider to the user’s satisfaction.
9.3 Incident Response Lifecycle Examined
9.3.1 Preparation
Identifying the challenges (and opportunities) for IR should be a formal project undertaken by information security professionals within the cloud customer’s organization prior to migration to the cloud.
If the level of IR expertise within the organization is deemed insufficient, experienced external parties should be consulted. This exercise should be undertaken during every refresh of the enterprise Incident Response Plan.
In each lifecycle phase discussed below, the questions raised and suggestions provided can serve to inform the customer’s planning process. Integrating the concepts discussed into a formally documented plan should serve to drive the right activities to remediate any gaps and take advantage of any opportunities.
Preparation begins with a clear understanding and full accounting of where the customer’s data resides in motion and at rest.
Given that the customer's information assets may traverse organizational, and likely, geographic boundaries necessitates threat modeling on both the physical and logical planes.
Data Flow diagrams that map to physical assets, and map organizational, network, and jurisdictional boundaries may serve to highlight any dependencies that could arise during a response.
Since multiple organizations are now involved, Service Level Agreements (SLA)and contracts between the parties now become the primary means of communicating and enforcing expectations for responsibilities in each phase of the IR lifecycle.
It is advisable to share IR plans with the other parties and to precisely define and clarify shared or unclear terminology. When possible, any ambiguities should be cleared up in advance of an incident.
It is unreasonable to expect CSP’s to create separate IR plans for each customer. However, the existence of some (or all) of the following points in a contract/SLA should give the customer organization some confidence that its provider has done some advanced planning for Incident Response:
(A) Points of Contact, communication channels, and availability of IR teams for each party
(B) Incident definitions and notification criteria, both from provider to customer as well as to any external parties
(C) CSP’s support to customer for incident detection (e.g., available event data, notification about suspicious events, etc.)
(D) Definition of roles/responsibilities during a security incident, explicitly specifying support for incident handling provided by the CSP (e.g., forensic support via collection of incident data/artifacts, participation/support in incident analysis, etc.)
(E) Specification of regular IR testing carried out by the parties to the contract and whether results will be shared
(F) Scope of post-mortem activities (e.g, root cause analysis, IR report, integration of lessons learned into security management, etc.)
(G) Clear identification of responsibilities around IR between provider and consumer as part of SLA
Once the roles and responsibilities have been determined, the customer can now properly resource, train, and equip its Incident Responders to handle the tasks that they will have direct responsibility for.
For example, if a customer-controlled application resides in a PaaS model and the cloud provider has agreed to provide (or allow retrieval of) platform-specific logging, having the technologies/tools and personnel available to receive, process, and analyze those types of logs is an obvious need.
For IaaS and PaaS, aptitude with virtualization and the means to conduct forensics and other investigation on virtual machines will be integral to any response effort.
A decision about whether the particular expertise required is organic to the customer organization or is outsourced to a Third Party is something to be determined during the preparation phase. Please note that outsourcing then prompts another set of contracts/NDA’s to manage.
Between all involved parties, communication channels must be prepared.
Parties should consider the means by which sensitive information is transmitted between parties to ensure that out-of-band channels are available and that encryption schemes are used to ensure integrity and authenticity of information.
Communication during IR can be facilitated by utilizing existing standards for the purpose of sharing indicators of compromise or to actively engage another party in an investigation.
For example, the Incident Object Description Exchange Format (IODEF) [6] as well as the associated Real-time Inter-network Defense (RID) standard [7,8] were developed in the Internet Engineering Task Force (IETF) and are also incorporated in the International Telecommunication Union’s (ITU) Cybersecurity Exchange (CYBEX) project.
IODEF provides a standard XML schema used to describe an incident, RID describes a standard method to communicate the incident information between entities, which includes, at least, a CSP and tenant.
The most important part of preparing for an incident is testing the plan. Tests should be thorough and mobilize all the parties who are likely going to be involved during a true incident.
It is unlikely that a CSP has resources to participate in tests with each of its customers; the customer should therefore consider role-playing as a means to identify which tasking or requests for information are likely to be directed at the CSP.
This information should be used to inform future discussions with the provider while in the preparation phase. Another possibility is for the customer to volunteer to participate in any testing the CSP may have planned.
9.3.2 Detection and Analysis
As outlined above, cloud computing provides challenges in both cases. Firstly, availability of data to a large extent depends on what the cloud provider supplies to the customer and may be limited by the highly dynamic nature of cloud computing.
Secondly, analysis is complicated by the fact that the analysis at least partly concerns non-transparent, provider-owned infrastructure, of which the customer usually has little knowledge and – again – by the dynamic nature of cloud computing, through which the interpretation of data becomes hard, sometimes even impossible.
Putting aside the technical challenges of incident analysis for a moment, the question on how a digital investigation in the cloud should be conducted in order to maximize the probative value (i.e. credibility) of the evidence at the time of writing remains largely unanswered.
Hence, until legal cases involving cloud incidents have become more common place and commonly accepted best practice guidelines exist, analysis results for cloud security incidents incur the risk of not standing up in court.
Until standards, methods, and tools for detecting and analyzing security incidents have caught up with the technical developments introduced by cloud computing, incident detection and analysis will remain especially challenging for cloud environments.
Cloud customers must rise to this challenge by making sure that they have access to (1) the data sources and information that are relevant for incident detection/analysis as well as (2) appropriate forensic support for incident analysis in the cloud environment(s) they are using.
9.3.3 Data Sources
It is imperative for the customer organization to conduct an assessment of what logs (and other data) are available, how they are collected and processed, and finally, how and when they may be delivered by the CSP.
The main data source for the detection and subsequent analysis of incidents on the customer side, is logging information.
The following issues regarding logging information must be taken into consideration:
(A) Which information should be logged? Examples of log types that may be relevant are audit logs (e.g. network, system, application, and cloud administration roles and accesses, backup and restore activities, maintenance access, and change management activity), error logs (e.g., kernel messages of hypervisors, operating systems, applications, etc.), security-specific logs (e.g., IDS logs, firewall logs, etc.), performance logs, etc. Where existing logging information is not sufficient, additional log sources have to be negotiated/added.
(B) Is the logged information consistent and complete? A prime example of inconsistent source logging information is failure to synchronize clocks of log sources. Similarly, incomplete information regarding the time zone in which the timestamps in logs are recorded makes it impossible to accurately interpret the collected data during analysis.
(C) Is the cloud’s dynamic nature adequately reflected in the logged information? The dynamic behavior of cloud environments also is a frequent reason for inconsistent and/or incomplete logging.
For example, as new cloud resources (VM’s, etc.) are brought online to service demand, the log information produced by the new resource instance will need to be added to the stream of log data.
Another likely problem is the failure to make dynamic changes in the environment explicit in the log information.
For example, consider the case that web service requests to a certain PaaS component are logged, but may be serviced dynamically by one of various instances of this service. Incomplete information regarding the question, which instance served which request, may then make proper analysis hard or impossible,
e.g., if the root cause of an incident is a single compromised instance.
(D) Are overall legal requirements met? Privacy issues regarding co-tenants, regulation regarding log data in general and personally identifiable information in particular, etc., may place limitations on the collection, storage, and usage of collected log data. These regulatory issues must be understood and addressed for each jurisdiction where the company’s data is processed or stored.
(E) What log retention patterns are required? Legal and compliance requirements will direct specific log retention patterns. Cloud customers should understand and define any extended log retention patterns to meet their need over time to support their requirements for incident analysis/forensics.
(F) Are the logs tamper-resistant? Ensuring that logs are in tamper resistant stores is critical for accurate legal and forensic analysis. Consider the use of write-once devices, separation of servers used to storage logs from application servers and access controls to servers storing logs as critical aspects of this requirement.
(G) In which format is logging information communicated? Normalization of logging data is a considerable challenge. The use of open formats (such as the emerging CEE [9]) may ease processing at the customer side.
CEE - Cloud Execution Environment
(1) The cloud provider can only detect some incidents because such incidents occur within the infrastructure owned by the cloud provider. It is important to note that the SLA’s must be such that the cloud provider informs cloud customers in a timely and reliable manner to allow for agreed IR to occur.
For other incidents (perhaps even detectable by the customer) the cloud provider may be in a better position for detection. Cloud customers should select cloud providers that optimally assist in the detection of incidents by correlating and filtering available log data.
(2) The amount of data produced from the cloud deployment may be considerable. It may be necessary to investigate cloud provider options regarding log filtering options from within the cloud service before it is sent to the customer to reduce network and customer internal processing impacts.
Additional considerations include the level of analysis or correlation performed by the CSP and the cloud tenant to identify possible incidents prior to forensics. If analysis is performed at the CSP, the escalation and hand-off points for the incident investigation must be determined.
9.3.4 Forensic and Other Investigative Support for Incident Analysis
(2) It is important that the customer understands the forensic requirements for conducting incident analysis, researches to what extent CSP’s meet these requirements, chooses a CSP accordingly, and addresses any remaining gaps. The amount of potential evidence available to a cloud customer strongly diverges between the different cloud service and deployment models.
(3) For IaaS services, customers can execute forensic investigations of their own virtual instances but will not be able to investigate network components controlled by the CSP. Furthermore, standard forensic activities such as the investigation of network traffic in general, access to snapshots of memory, or the creation of a hard disk image require investigative support to be provided by the CSP.
Also advanced forensic techniques enabled by virtualization such as generating snapshots of virtual machine states or VM introspection on live systems require forensic support by the CSP.
(4) With PaaS and SaaS security incidents that have their root cause in the underlying infrastructure, the cloud customer is almost completely reliant on analysis support of the CSP, and as mentioned previously, roles and responsibilities in IR must be agreed upon in the SLAs.
With PaaS, the customer organization will be responsible for any application layer code that is deployed to the cloud. Sufficient application logging is required for incident analysis in scenarios where the root cause lies within the application (e.g., a flaw in the application code).
In this case, support by the CSP could take the form of facilitating application log generation, secure storage, and secure access via a read-only API [10]. SaaS providers that generate extensive customer-specific application logs and provide secure storage as well as analysis facilities will ease the IR burden on the customer. This may reduce application level incidents considerably.
(5) Providers that use their management backplane/systems to scope an incident and identify the parts of a system that have been under attack or are under attack, and provide that data to the cloud customer, will greatly enhance the response in all service models.
(6) To prepare for incident analysis in a given cloud environment, the customer’s IR team should familiarize themselves with information tools the cloud vendor provides to assist the operations and IR processes of their customer.
Knowledge base articles, FAQs, incident diagnosis matrices, etc. can help fill the experience gap a cloud customer will have with regard to the cloud infrastructure and its operating norms. This information may, for example, assist the IR team in discriminating operational issues from true security events and incidents.
9.3.5 Containment, Eradication, and Recovery
(2) The strategies must be also consistent with business goals and seek to minimize disruption to service. This is considerably more challenging when multiple organizations are involved in the response, as is the case with cloud computing.
(3) Options for this phase will differ depending upon the deployment and service model, and also the layer of the stack at which the attack was targeted. There may be multiple strategies that can be employed, possibly by different entities equipped with different technological solutions.
(4) If at all possible, thought exercises should be conducted in the preparation phase to anticipate these scenarios and a conflict resolution process identified. Customers must also consider how their provider will handle incidents affecting the provider itself or affecting other tenants on a shared platform in addition to incidents that are directly targeted at their own organization.
(5) Consumers of IaaS are primarily responsible for the containment, eradication, and recovery from incidents. Cloud deployments may have some benefits here.
For example, isolating impacted images without destroying evidence can be achieved by pausing these images. As discussed in the introduction, the relative ease with which nodes can be shut down and new instances brought up may help to minimize service interruption when a code fix needs to be deployed.
If there are issues with a particular IaaS cloud, then the customer may have the option of moving the service on to another cloud, especially if they have implemented one of the meta-cloud management solutions.
(6) The situation is more complicated for SaaS and PaaS deployments. Consumers may have little technical ability to contain a Software or Platform as a Service incident other than closing down user access and inspecting/cleaning their data as hosted within the service prior to a later re-opening.
But especially for SaaS, even these basic activities may be difficult or impossible without adequate support by the CSP, such as fine-grained access control mechanisms and direct access to customer data (rather than via the web interface).
(7) In all service models, providers may be able to assist with certain categories of attack, such as a Denial of Service (DoS).
For example, smaller enterprises may benefit from the economies of scale, which allow for more expensive mitigation technologies, such as DoS protection, to be extended to their sites.
As for the previous phases, the extent to which facilities at the provider will be made available to the customer to assist in responding to an attack should be identified in the preparation phase.
In addition, the conditions under which the provider is obligated to provide assistance to responding to an attack should be contractually defined.
(8) The SLA’s and the IR plan should be flexible enough to accommodate a “Lessons Learned” activity after the recovery.
A detailed Incident Report based on the IR activities is to be written and shared with impacted parties, i.e., between the cloud customer, CSP and other affected/involved organizations.
The Incident Report should include the timeline of the incident, analysis of the root cause or vulnerability, actions taken to mitigate problems and restore service, and recommendations for long-term corrective action.
(9) Corrective actions are likely to be a blend of customer-specific and provider supported, and the provider’s Incident Response team should provide a section with their perspective of the incident and proposed resolution.
After an initial review of the Incident Report by the customer and CSP, joint discussions should be held to develop and approve a remediation plan.
9.4 Recommendations
(B) Cloud customers must set up proper communication paths with the CSP that can be utilized in the event of an incident. Existing open standards can facilitate incident communication.
(C) Cloud customers must understand the CSP's support for incident analysis, particularly the nature (content and format) of data the CSP will supply for analysis purposes and the level of interaction with the CSP's incident response team.
In particular, it must be evaluated whether the available data for incident analysis satisfies legal requirements on forensic investigations that may be relevant to the cloud customer.
(D) Especially in case of IaaS, cloud customers should favor CSP’s that leverage the opportunities virtualization offers for forensic analysis and incident recovery such as access/roll-back to snapshots of virtual environments, virtual-machine introspection, etc.
(E) Cloud customers should favor CSP’s that leverage hardware assisted virtualization and hardened hypervisors with forensic analytic capabilities.
(F) For each cloud service, cloud customers should identify the most relevant incident classes and prepare strategies for the incident containment, eradication, and recovery incidents; it must be assured that each cloud provider can deliver the necessary assistance to execute those strategies.
(G) Cloud customers should obtain and review a CSP’s history for incident response. A CSP can provide industry recommendations from existing customers about its IRP.
9.5 Requirements
(B) The SLA of each cloud-service provider that is used must guarantee the support for incident handling required for effective execution of the enterprise incident response plan for each stage of the incident handling process: detection, analysis, containment, eradication, and recovery.
(C) Testing will be conducted at least annually. Customers should seek to integrate their testing procedures with that of their provider (and other partners) to the greatest extent possible.
Ideally, a team (comprising Customer and CSP members) should carry out various health check tests for an incident response plan, and accordingly, suggestions should be implemented into a new version of incident response plan.
--== || END ||==--
No comments:
Post a Comment