Remember Me? Why We Mustn’t Forget About Data Archiving.
44 Trillion Gigabytes
The problem with data archiving is, it doesn’t seem very interesting or important until…well…it is! This could be when you’re due for a compliance audit, you need to access some specific historical data for testing or you need it as evidence in a lawsuit. Of course, it’s a time-consuming process that needs to be carefully strategised and carried out in anticipation of these needs and the penalties for not doing so are high. We are undeniably in an era of data explosion and according to market research company IDC, our “Digital Universe” of data is doubling in size every two years and, by 2020, the data we create and copy annually will reach 44 zettabytes (44 trillion gigabytes)! It’s clear then that we need a strategy to create space for new data, while ensuring that we keep hold of existing information that’s useful or required to be kept by law.
Redefining Data Archiving
Even defining data archiving can be tricky, as traditionally it was often used to refer to backup or email archives. Now, the first step to data archiving success is to redefine what is means for your business, so you can properly utilise the vast quantities of unstructured data produced by SMACT, IoT, BYOD and the consumerisation of IT. It’s also essential to have a clear definition so that all personnel understand what you are trying to achieve. Of course, backup and archive are related, and your strategy should include both to achieve optimal results.
In simple terms data archiving is moving structured and unstructured information, that is not in frequent use, off primary performance based systems and onto secondary high capacity systems, for long-term retention. The data needs to be easy to search and simple to retrieve, so you can access information fast when you need it for business-critical activities such as testing and QA.
Insights-Driven for a Competitive Edge
You may be surprised to know that there are a huge number of significant business benefits to data archiving, beyond just complying with regulations and being more organised.
Insights-driven businesses have a competitive edge based on smart, knowledge-based decisions resulting in greater innovation, more opportunities, decreased operational costs and revenue stability. Losing data can incur large costs in recreating it when you discover it’s needed, so it’s much better to ensure that it’s properly stored and easily retrieved. Primary storage is generally more expensive than archive systems. Poor quality or hard to access test data leads to production defects, poor quality products and services and reputational damage that you simply can’t afford in this customer experience focused market. So, if you have a lot of currently irrelevant and unused data on your main system, you’re needlessly sacrificing performance and wasting money, time and space daily, holding it there and backing it up unnecessarily.
Archiving also enables you to easily rediscover data pertaining to innovative ideas, studies and surveys that were not used or seemed exhausted at the time but have become relevant to the success of your business again, over time.
What, Where and Why?
A lot of organisations struggle to identify what data should be archived and what needs to be left on the primary storage platform. Historically in sectors in which intervention by regulatory bodies is high and requirements are stringent, such as Finance and Insurance, there was a temptation not to archive any information in case it was needed. This put legacy systems under incredible pressure, resulting in performance and cost inefficiencies. However new regulations with high penalties for non-compliance, such as the GDPR, now mean this is no longer possible and, from a wider business perspective, it’s not desirable anyway. To commence your data archiving process, you need to start with 12 basic questions that apply to all businesses:
- Where are your different types of data currently stored and what is the quality of that data?
- Whether that data is personal or otherwise;
- If it is structured, unstructured or a combination?
- Who uses it, how often and what for?
- How long each type needs to be accessed or stored
- What are the legal requirements for each type of data?
- How will the data files, directory systems and past versions be inventoried, searched, located and retrieved?
- How will you approach data migration and what policies do you need to create to automatically snapshot, verify and re-export data?
- What will your user interface look like?
- How will your system report on archive status?
- Will you archive on premise or in the cloud or with a hybrid solution?
- Does the solution need to support multiple vendors?
Then you need to look at what the current internal policies are and how they need updating. You also need to consider the recent spate of high profile “right to be forgotten” cases and the landmark European ruling that Google should erase irrelevant and outdated data. To make sure you’re not falling foul of both statutes and precedents with irrelevant, inaccurate and outdated data, work together with both legal and data management experts from the outset of the strategy.
More Freedom with Cloud
It’s time to retire cumbersome legacy systems comprising on-premise infrastructure, tapes and disks and leverage the agile, secure, scalable and generally more cost effective solutions available on the cloud. It’s best to work with an expert partner to determine if some of your data still needs to be kept on-premise for instant access, for example by local applications on virtual machines, as on -premise can be slightly faster than cloud in some circumstances.
For unstructured data the scalability and flexibility of a cloud solution is usually the best option, particularly now that cloud storage solutions have reached a new level of maturity with low set-up and migration costs, a smoother transition, and access to all types of unstructured date, at any time from any device. Ensure that your cloud solution is customisable, so you can easily group your different types of data according to factors that matter to your business and test organisation and then archive, retain, retrieve, utilise or delete as needed.
When it comes to sensitive data, it makes sense for your business to retain control over it rather than passing it to a vendor’s proprietary cloud. Also ensure that the contract with your cloud vendor does not involve any kind of data conversion that would make it impossible for you to use your data without expensive conversion, should you decide to change vendors. One of the best ways to avoid any shackles is to go with an open standard public cloud solution that allows you maintain control over your data in its original format. At Sogeti, we use Microsoft Azure for our cloud-based data, testing and development services, delivered by our unique OneShare portal. This enables organisations to maintain control over their data for improved governance, easier data discovery, simple records management, easier access to better quality test data and environments and improved data analytics, due to the easy integration of third-party applications. So, by leveraging a cloud native solution you can optimise the cost and speed of your storage and archiving for better test data management and wider insights-driven business decisions.
You can find out more about our cloud testing and development services here, our Test Data Management services here and our ongoing GDPR services here.
- Sogeti UKMake an enquiry
0330 588 8000
Sogeti UKMake an enquiry
0330 588 8000