Author: Pash Hrnic, Field Solution Architect
Data is a precious resource for any organization. Although data protection is only a small facet of an effective data management program, it is one of the most crucial. A company’s ability to provision, classify, govern, process and store their data plays an important role in how well they are protected. This article will explore the next normal of data protection and demonstrate that with improved data management comes stronger means of protecting it.
Traditional data protection models
In traditional data protection models, structured and unstructured filesystems were protected with disk to tape or disk to disk solutions. If data was lost or service failed due to disaster, corruption or general hardware failure, the data would be rebuilt and recovered to where it needed to be.
As more business-critical applications were adopted and the impact of downtime costs assessed, recovery time objectives1 (RTO) needed to improve to become manageable. Organizations faced challenges – such as database and application consistency – and application pools needed to be backed up in synchronicity to maintain consistency and integrity of data. Array-based snapshots2, storage-integrated replication technologies3 and 10G networks4 paved the way for rapid-restores and the next-generation of data protection solutions. Through this transition, the push for backup applications to integrate with storage systems had a positive impact on recovery point objectives5 (RPO) as snapshots can now be taken of an application and its data down to the minute. As organizations began to adopt cloud services,6 this forced them to look for data protection solutions that could span across on-premises and cloud environments, while maintaining similar (or better) RPOs and RTOs.
The modern approach
Modern workloads require an updated data protection solution. This is evident as the CDW Cloud report indicates that 56 percent of IT leaders have committed to a hybrid cloud strategy for business systems. Just ten years ago it was normal for organizations to run a good portion of their Tier 1 applications7 from bare metal servers8 (monolithic architecture)9. Fast forward to 2021 and the shift in business systems has accelerated exponentially. Virtual, hybrid, all cloud and multicloud infrastructure such as Infrastructure as a Service10 (IaaS), are becoming the norm in today’s world and more organizations’ databases are being deployed on dedicated platform offerings like Platform as a Service11 (PaaS). Similarly, email, office communications, IT service management, HR and customer relationship tools are moving to Software as a Service12 (SaaS)-based models. Endpoints and mobile devices are surging in numbers and organizations are decoupling their applications from monolithic server architecture in favour of containerization13. This is critical in order to identify the right solutions that can span across the various architectures to protect services and data assets.
The pace of IT and business technology moves at warp speed, and this results in important opportunities for organizations to become more agile and efficient. One advantage of this evolution is operating at cloud speeds, which allows new features and improvements to existing data protection products to be rolled out on a regular basis with a service pack and version updates. The adaptability of tools to leverage new and existing cloud application programming interfaces14 (API) for integrations with SaaS, PaaS and IaaS is driving the rapid evolution to keep up with modern IT workloads.
Data safety using the 3-2-1 Rule
To fully appreciate the modern approach and evolution of data protection, it is worth understanding how backup targets have evolved. Traditionally, the 3-2-1 Rule has been considered best practice for organizations to make copies of their data. The rule states that three copies of the data (the production data and two backup copies) should be stored on two different types of media, with one copy offsite. For example, a full backup would see an initial copy to disk, then a second copy written to tape, or a secondary storage appliance. Ideally, that tape and/or secondary storage appliance would be deployed offsite. This rule still holds true today, though the targets and media may look a little different.
Modern data protection tools have incorporated flexibility on storage targets into their design. For instance, on-premises disk and tape libraries are still viable options, as is object storage15, whether it be a public cloud offering or a service deployed on-premises.
At a high level, the advantages of using object storage for backup data include:
- Speed – In contrast to hierarchical architectures, file objects are stored in a flat address space which makes access time considerably faster.
- Scalability – The flat address space facilitates scalability, as organizations or their cloud object store provider can simply add extra nodes to increase capacity.16
As organizations look to replace legacy tape and disk technologies with more affordable and reliable solutions, finding options that offer the ability to write to object storage targets is a desired feature in data protection products. For those using cloud-based workloads, it makes sense to keep the backup copies on a storage location that is close to the application workload, in order to maintain rapid restores and keep a low RTO. The architecture for modern day cloud applications allows them to reside in one availability zone while the data can be backed up to a geographically diverse location. For example, data can be stored in another cloud zone or region, with a different cloud provider or even on-premises.
The ultimate goal of the 3-2-1 Rule is to ensure reliable data protection by keeping a certain amount of backup copies near-line to facilitate rapid restores, as well as an offsite copy of data that can be recalled in the event of complete site or systems failure.
So, what are the risks?
Data protection applications had to evolve to mitigate a new wave of risks. In addition to power outages and natural disasters, organizations are more likely to suffer a data loss from a systems failure, ransomware attack, human error or insider threat today. In the battle against ransomware, backup solutions have become the last line of defence against bad actors.
Modern day backup suites have introduced technologies such as air-gapping17 and ‘write once read many’18 (WORM) copies. While WORM copies have existed for years, integrated and automated air-gapping is a relatively new concept that came about with the upsurge in ransomware attacks. Early air-gapping solutions wrote copies to removable media and kept them offline. More modern air-gapping solutions are integrated into backup applications and involve removing network accessibility to a dataset that is backed up. WORM copies completely deny changes to backup sets for a definable or infinite period, which is done at the device and application level.
Both WORM copies and air-gapping function to make backup copies inaccessible and immutable, consequently rendering them impervious to ransomware and similar types of cyberthreats.
Another solution to counter both ransomware and insider threats is anomalous behaviour detection.19 This solution functions by leveraging artificial intelligence and machine learning to identify patterns in users’ interactions with files. If the pattern is unusual and deviates from normal operations carried out by that user, automated actions can be executed to either alert system administrators or bar access to data.
Simplifying data protection
Data protection applications have evolved to support a wider array of features and have become much simpler than legacy tools in the process. Data protection infrastructure elements can now be installed across physical and virtual appliances, on-premises and across cloud provider IaaS or marketplace instances to protect cloud-based workloads.
When it comes to authoring protection and data lifecycle policies, a simple approach is best. Historically, the imperative approach would be to write a policy that contained all the steps required to complete a backup, such as data source, storage target and schedule. However, a declarative and more simple approach defines what the end state of the backup should be and then has the end user decide which SLA the data should be assigned. Thus, in a declarative policy, elements of the backups are defined once, and data assets are associated with policies based on their required SLA.
The next normal
Data is the crown jewel for most organizations today. Like a precious stone, data can be cut down and processed, broken down into smaller gems and reconstructed to form a larger adornment. Each piece of that jewel always needs to be accounted for and should be protected from counterfeit and theft. It is only through effective management and protection that an organization can truly benefit and realize the value of its most important asset – its data. The next normal and continued evolution of IT workloads and rapid data growth is just one more step in the journey toward effective data protection and management.
To learn more about the next normal of data protection or to speak to a CDW expert about a customized data protection solution, please visit cdw.ca/datacentre.
10G: An internet speed delivering 10 gigabits per second.
Air-gapping: Air-gapped computer systems contain highly sensitive data and are connected via private networks that are completely isolated from the Internet. A firewall between the network and the Internet does not make the network air gapped. It has to be physically isolated (“air in between”).
Anomalous behaviour detection: An approach to intrusion detection that establishes a baseline model of behavior for users and components in a computer system or network. Deviations from the baseline cause alerts that direct the attention of human operators to the anomalies.
Application programming interfaces (APIs): Allows one computer program to make its data and functionality available for other programs to use. Developers use APIs to connect software components across a network.
Array-based snapshot: A copy of the image of a running virtual machine or application server at a specific point in time.
Bare-metal server: In cloud computing, a bare-metal server is a non-shared computer dedicated to one customer. It generally implies a non-virtual machine environment.
Cloud application programming interface (API): The software interface that allows developers to link cloud computing services together.
Cloud services: Infrastructure, platforms or software that are hosted by third-party providers and made available to users through the internet.
Containerization: A form of operating system virtualization, through which applications are run in isolated user spaces called containers, all using the same shared operating system. A container is essentially a fully packaged and portable computing environment.
Infrastructure as a Service (IaaS): A cloud computing service that provides a basic computing platform, typically the hardware and virtual machine infrastructure (no operating system) or the hardware and an operating system.
Monolithic architecture: A single-tiered software in which different components combined into a single program from a single platform.
Object storage or object-based storage: A computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.
Platform as a Service (PaaS): A cloud computing service that provides a comprehensive computing environment. PaaS includes the hardware, operating system, database and other necessary software for the execution of applications. It may include a complete development environment as well. PaaS is a step up from “infrastructure as a service” which provides only the servers and operating systems.
Recovery point objective (RPO): The maximum acceptable amount of data loss measured in time. It is the age of the files or data in backup storage required to resume normal operations if a computer system or network failure occurs.
Recovery time objective (RTO): The maximum desired length of time allowed between an unexpected failure or disaster and the resumption of normal operations and service levels. The RTO defines the point in time after a failure or disaster at which the consequences of the interruption become unacceptable.
Software as a Service (SaaS): Software that is rented rather than purchased. Instead of buying applications and paying for periodic upgrades, SaaS is subscription based, and upgrades are automatic during the subscription period. When that expires, the software is no longer valid.
Storage-integrated replication technology: An approach to replicate data available over a network to numerous distinct storage locations.
Tier 1 application: An information system that is vital to the running of an organization.
Write Once Read Many (WORM): There are two kinds of optical drive technologies that prevent files from being rewritten. The traditional ablative WORM makes a permanent change in the recording material. Continuous composite write is a WORM mode in a normally rewritable magneto-optical cartridge. The drive’s firmware ensures that recorded areas on the medium are not rewritten.
1. Techopedia, Recovery Time Objective (RTO) – What Does Recovery Time Objective (RTO) Mean? –https://www.techopedia.com/definition/24250/recovery-time-objective–rto
2. Chris Evans, Computer Weekly, Snapshots: Hypervisor vs Array – https://www.computerweekly.com/feature/Snapshots-Hypervisor-vs-Array
3. Vishal Agrawal, HEVO, Data Replication Storage: A Comprehensive Guide – https://hevodata.com/learn/replication-storage-a-comprehensive-guide/#unst
4. Computer World, Everything you need to know about 10G, the future of broadband technology – https://www.computerworld.com/article/3448623/everything-you-need-to-know-about-10g-the-future-of-broadband-technology.html
5. Techopedia, Recovery Point Objective (RPO) – What Does Recovery Point Objective (RPO) Mean? – https://www.techopedia.com/definition/1032/recovery-point-objective-rpo
6. RedHat, What are cloud services? – https://www.redhat.com/en/topics/cloud-computing/what-are-cloud-services
7. PC Magazine encyclopedia – https://www.pcmag.com/encyclopedia/term/tier-1-application
8. PC Magazine encyclopedia – https://www.pcmag.com/encyclopedia/term/bare-metal-server
9. Siraj ul Haq, Medium, Introduction to Monolithic Architecture and MicroServices Architecture – https://medium.com/koderlabs/introduction-to-monolithic-architecture-and-microservices-architecture-b211a5955c63
10. PC Magazine encyclopedia – https://www.pcmag.com/encyclopedia/term/iaas
11. PC Magazine Encyclopedia – https://www.pcmag.com/encyclopedia/term/paas
12. PC Magazine Encyclopedia – https://www.pcmag.com/encyclopedia/term/saas
13. Citrix, What is containerization and how does it work? – https://www.citrix.com/solutions/app-delivery-and-security/what-is-containerization.html
14. Techopedia, Cloud Application Programming Interface (Cloud API) – What Does Cloud Application Programming Interface (Cloud API) Mean? – https://www.techopedia.com/definition/26437/cloud-application-programming-interface-cloud-api
15. Wikipedia, Object storage – https://en.wikipedia.org/wiki/Object_storage
16. CloudTweaks, Why you should consider object storage for your backups – https://cloudtweaks.com/2018/02/consider-object-storage-backups/
17. PC Magazine Encyclopdia – https://www.pcmag.com/encyclopedia/term/air-gapped
18. PC Magazine Encyclopedia – https://www.pcmag.com/encyclopedia/term/worm
19. PC Magazine Encyclopedia – https://www.pcmag.com/encyclopedia/term/anomaly-detection