Data Backup Related Terminology
The term data backup refers to either the activity of copying computer files for purposes of retention or disaster recovery or to the actual repository of computer files that have been made for those purposes. Data backup (the activity) is usually accomplished through the process of copying files to tape, disk, or another medium.
Two key concepts are important relating to data backup. First, deciding between local versus offsite storage of the data is always a consideration. Having an offsite backup provides protection from a disaster that affects your entire place of business such as a fire, flood, tornado, hurricane, earthquake, etc. Offsite storage is generally now being accomplished through online storage. Please see our definition of Online Storage.
Second, data backup generally includes a retention policy. A retention policy defines the period of time for which the additional copy will be kept. An example of a typical retention policy is a policy used for Global Data Vault’s typical Advanced Data Protection service, and is shown below: For 7 days, users can go back to any individual point when a data backup was taken. Typically this is hourly for most customers. Customers actually set the policy, but we recommend hourly from 8am to 6pm, Monday to Friday which is the default. Then, for 14 days, users can recover any individual day, of which the most recent 7 allow access to any hour as above. Then for 30 days, users can choose a particular week. After that, users can choose a specific month.
A remote backup is a copy of a data set that is stored at a remote location for security purposes. The benefit of having a remote backup is it protects the business in the event of a site disaster, such as a fire, flood, tornado, earthquake, or tsunami, etc. It is generally recommended that remote backup be located at least 50 miles from the site where the primary backup is stored, as such; a disaster affecting the primary site will not likely also affect the remote backup site.
Online backup refers to the concept of making backup over the internet or “online”. Due to security concerns associated with the internet, online backups should always be encrypted at the primary site. Because most internet connections are not fast enough to move large amounts of data, online backup is usually accomplished by moving the data once in its entirety then only moving data that has changed since the last backup on a routine basis.
Email backup refers to keeping a second copy of the data stored in an email system. That data includes not only messages but also includes attachments to messages, contact information, and tasks and activities. Email is one of the most important applications for many organizations and therefore proper backup for email is critical. At the same time, however, because email systems themselves are so complex, making proper backup for email systems is also a complex task.
The term system backup usually refers to a copy of the data from a server which includes all of the programs and data installed or stored on that server. System backups are designed to allow the programs and data to be restored to that server in a manner in which no other process is required to return the server to a usable condition after a failure. In contrast, a data backup is a backup that only includes data. If a system protected by data backup suffers a serious failure, it is generally necessary to first install operating systems and application programs and then restore from the data backup.
A disc backup is a second copy of all of the data from a specific disc. Disc may refer to a CD, DVD, hard disk, RAID Array, NAS storage. Hard disks are rigid magnetic platters that are housed in an enclosed case and used to store data. The term RAID Array is defined separately in this glossary. NAS refers to Network Attached Storage, which is exactly as the name implies, i.e. data storage which is accessed through its attachment to a computer network and the attachment is usually via an Ethernet network.
Read: Payroll Data Protection and DRaaS (Case Study of a RAID storage system fail).
A network backup refers to a backup that is made within the boundaries of a single network, and usually a Local Area Network, or LAN. Network backup is easy to implement but prone to being unreliable and often consumes significant network resources. Network backup only protects data from failures of a portion of a network because, by definition, the redundancy provided is local to the network.
A full backup is a complete copy of the data being protected. A full backup differs from an incremental backup or differential backup in that incremental or differential backups do not singularly contain all of the data being protected. A full backup is by definition a backup that, when restored, is capable of completely returning a system to its condition prior to performing the restore operation.
An incremental backup is a subset of a full backup. The data in an incremental backup consists of only the data which has changed since the last backup operation. If an organization relies on incremental backup to protect its data, then to accomplish a data restore, you must restore the most recent full backup, and then also restore every incremental backup which was completed subsequent to the full backup.
A differential backup is a subset of a full backup. The data in a differential backup consists of only the data which has changed since the last full backup operation. If an organization relies on differential backup to protect its data, then to accomplish a data restore, you must restore the most recent full backup, and then also restore the latest differential backup. The concept of incremental and differential backups was developed to address the difficulties in making a backup of large systems to tape. Tape systems are slow and costly, and both incremental and differential backup offer a means to short-cut the backup process on a daily basis.
Copy backup is a backup made by simply copying data. Copy backup is an easy way to make backup, but it is a highly manual process. Copy backup only provides multiple versions of the protected data to the extent that multiple versions are manually maintained. Having multiple versions in a retention policy or retention cycle is important to allow recovery to different points in time, which is valuable for two reasons. First, when a failure occurs data may be corrupted before it is noticed and a subsequent backup may over-write the only clean copy of the data from a time prior to the corruption. And second, it is frequently important to restore data in large data sets as it was at a specific earlier time, collectively. An example of this second reason is in the case of financial accounting data. If you restore all the invoices your customer has not paid from one point in time but restore the customer’s balance due from another point in time you have lost what is called referential integrity and your data is then in conflict. You don’t know which data should be viewed as being the correct data.
A daily backup is a copy of programs and/or data files made once each day. The term daily backup also refers to the activity of making a backup copy of data each day. Daily backup and daily backups have been a routine part of computer systems management practices since the advent of business and application computing. Daily backups are normally performed at the end of the day. This allows an organization to have the option to restore its data to and revert to data as it was at the end of the previous day.
Data Protection Terminology
RAID is an acronym for Redundant Array of Inexpensive Disks. RAID provides a means to improve the reliability of disk storage while still utilizing low cost disk drives. This is accomplished by arranging the disk drives into arrays and creating redundancy of the information stored on the drives. Storage systems using RAID are referred to as being a RAID array. RAID arrays distribute data onto multiple disks. Computers that access the array see the data there as if it were on only one disk.
Data Cloning is a process in which virtual servers are completely copied. Virtual servers are isolated and encapsulated software-based versions of physical servers that operate in a virtual layer and are separated from direct access to the server hardware. Because virtual servers are separated from the physical hardware, it is a relatively easy and fast process to copy all the programs and data to a new cloned server.
Normal backup refers to an organization’s routine data backup process. Data backup exists to allow recovery of data as it existed at an earlier point in time. This is only necessary if a loss of data has occurred, but in our studies, we have found that one in five organizations lose data that needs to be restored in a given six-month period.
Data partitioning is the practice of segregating data into separate storage units or partitions. Partitioning protects data by isolating separate data sets onto physically isolated hardware.
Data striping is the technique of storing parts of a file on separate physical devices, usually disk drives within the same RAID array. Striping is logically complex and is often combined with redundancy techniques. In this way, striping provides greater security for the data as well as improved performance in reading and writing speed. Striping is sometimes used together with mirroring.
Data mirroring is a technique in which all disk writes are duplicated simultaneously to two separate disk drives. This creates an identical pair of disk drives, and if one drives fails, then the other serves to provide all the required data. The risk faced when using mirrored drives is that a software failure or data corruption can be easily mirrored to both drives leaving the user with no data protection.
Data security is a broad term that refers to the practices and tools used to protect the information stored in information technology systems which nowadays is under increasing threat.
The term disaster recovery covers broad ground. In this context, we’ll define disaster recovery as it applies to Information Technology systems. Disaster recovery is the process of restoring Information Technology systems to a proper operating state following a failure and more specifically a failure brought on by a disaster. In a broader context disaster recovery also refers to restoring all of the other resources required for full functioning of an organization.
Application recovery is the process of restoring an application to proper functioning following a failure. Application recovery plans are generally focused on server backups and server mirroring. Application recovery systems are built specifically for functional silos like email, database applications, financial systems, and customer facing systems.
Business Continuity Planning
Business continuity planning is the process of developing plans to address these questions: What risks exist if within an organization the people, systems, or your data are unavailable for a period of time? What impact would this have on the effectiveness and or survival of the business? How should you plan to respond if some or all of these resources are unavailable? What is the business’s recovery timeframe, and what recovery timeframe is acceptable to the essential stakeholders?
HIPAA is the acronym for the Health Insurance Portability and Accountability Act of 1996. This law provides protection for workers who change jobs or lose their jobs by requiring insurers and employers to offer converges to these workers. HIPAA also includes extensive privacy rules which protect the privacy of individuals and govern the disclosure of health related information.
Sarbanes Oxley (also knows as SOX) is the law enacted by the U. S. Congress in 2002 in the wake of the Enron, Tyco and WorldCom scandals which strengthened financial disclosure and reporting requirements for publicly held companies. Sarbanes Oxley imposes stiff civil and criminal penalties for both management and independent auditors for misleading or inaccurate reporting, and it requires a higher level of responsibility from senior management as to the fairness of information reported in required disclosures.+
SSAE 16 (formerly SAS 70) Type 2
SAS 70 or Statements on Auditing Standards number 70 was promulgated by the Auditing Standards Board of the American Institute of Certified Public Accountants (AICPA) in 1992. SAS 70 creates a reporting standard and defines the appropriate tests required to audit internal controls. Thus it provides a uniform way to test and report about the controls in place in a company or organization. Having good internal controls provides a higher level of assurance on the security and accuracy of information stored or reported by the organization.
SOX is a popular acronym for the Sarbanes Oxley act. SOX established the Public Company Accounting Oversight Board which provides independent review and oversight of the auditing activities of public accounting firms in their performance of the attest function as auditors of publicly held businesses. SOX establishes strict guidelines to insure that the firms which audit public companies are in fact and in appearance actually independent. SOX additionally mandates strict requirements for corporate responsibility, financial disclosures, and conflict of interests among analysts. Want more information on SOX compliant backup?
Server Related Terminology
The term server platform refers to the operating system which runs on the hardware. The operating system is the software that enables applications to communicate with and control hardware. Microsoft’s Windows Server, VMWare’s vSphere, and Linux are all examples of commonly used server platforms. On occasion, server platform also refers to the architecture of the server itself, but since the “Intel” architecture is so dominant, the term platform is generally referring the operating system.
An application server is a centralized computer that supports multiple users and performs computing services specific to applications. Typical applications run on application servers are CRM, ERP, Financial Accounting, etc. Organizations normally use application servers when multiple users need access to a common application or system.
A mail server is a shared computer that provides services required to support email for an organization. These services include storage of messages as well as sending, receiving, and forwarding messages. Many mail servers also support group calendar management, contact management, task lists, and other productivity tools. Mail servers are complex and require precise configuration. They are also frequent targets of malicious attacks and therefore require careful security controls.
A file server is a shared computer that provides data storage of data files for multiple uses. Common files stored on file servers include word processing files, spreadsheets, presentations, flat file databases, graphic images, etc. Keeping this type of data on a dedicated file server provides higher levels of security, redundancy and control.
A list server is a server that maintains and supports electronic mailing lists which usually contain email and other contact information for the members of organizations and/or groups. The data in list servers should be carefully protected as these servers are also subject to hacking and malicious attacks.
A backup server is a server that manages data backup and the related data storage for other servers and workstations. Backup servers are normally dedicated specifically to the task of supporting data protection, and backup and restore functionality for other computers. As an online backup provider, Global Data Vault operates a series of backup servers which each support and manage the data protection and online backup of hundreds to thousands of servers, workstations and end-user computers.
The term mirrored server refers to a technology used for data protection. Two or more servers are configured using special software to copy each other in near real-time. If one of the mirrored servers fails, the data is protected and still available on the server which did not fail. Mirroring is an effective but expensive method to protect application and business continuity. Whenever mirroring is used, backup should also be used because sometimes corruption can be mirrored from the primary server to the mirrored server. The popular social bookmark site Magnolia failed under that scenario and was unable to recover.
A web server is a computer that is connected to the internet and used to store and display web pages and websites to internet visitors. The most common web server platforms are IIS from Microsoft and the open-source platform Apache. Hundreds or thousands of users can connect to a web server simultaneously. Operators of busy web sites often use an entire network of servers to function as a web server which can serve up a single high traffic site or alternatively many smaller sites.