I've been beating around the bush on a few different issues here. So what does my model look? We could talk about information lifecycles but I really want to talk about buckets, specifically this one:
Basically, we need to do one thing: stop the bucket from overflowing. To do this, we have three different options:
Reduce the inflow
This goal might be difficult to attain. In general, humans are information pack rats. We keep more than we know what to do with because we have a hard time of determining long-term value of information... and then we forget about it.
There might be localized opportunities for improvement. With email, for example, we can ensure that we have targeted spam filtering, etc. Managing email might involve a layered approach:
- People. Convince people to use email effectively and collect less crap. For example, "reply alls" with large attachments might not be such a good thing. It might also be valuable to assist people with actually getting stuff done. Email, for example, might not be the best way to manage certain types of documents or records.
- Process. Maintain appropriate email hygiene /w black lists, etc. Another process could be to communicate back with users to tell them how much email they're getting, and how to get rid of that email that they don't want. This exercise could be tied to general ediscovery for acceptable use exercises or it could rely on reporting (e.g., who has a lot of unopened email in their inbox).
- Technology. Get better spam filters or general email security gateway tools. Over 70% of inbound email is illegitimate so managing that inflow is important.
As for regular files, the inflow problem is certainly a challenge. It could be a matter of shifting user's perceptions to what should be kept and what shouldn't be kept. For example, can you help users clearly identify records where there is a retention requirement? Another issue might be personal information management. For example, do we really want people clogging up shares with their personal stuff? Do we advocate for something a bit more personal. For example, users could have a personal drive that syncs with a local drive. It could be the place we put all documentation. Public shares are then the location for information with a clear retention period (i.e., project documentation). Another strategy could be the use of cloud services. For example, Evernote does a great job of off-loading the storage requirement for personal documents of uncertain value. It also, however, introduces the possibility of data leakage or loss.
Increase the size of the bucket
It's the brute force approach. Gear up: more storage, more servers, etc. A bigger bucket will take a longer time to fill up... but it will fill up. There are a few different things that are required to increase the bucket. The first is really the effectiveness of our capacity management strategy. Ah hoc IT shops might not really have anything in place for capacity management so any expansion will come as a surprise. COBIT's BAI04 gives us a run-down on what we should, ideally, have in place for capacity management:
- Manage availability and
capacity
- Assess current availability,
performance and capacity and create a baseline
- Availability, performance
and capacity baselines
- Evaluations against SLAs
- Assess business impact
- Availability, performance
and capacity scenarios
- Availability, performance
and capacity business impact assessments
- Plan for new or changed
service requirements
- Prioritized improvements
- Performance and capacity
plans
- Monitor and review
availability and capacity
- Availability, performance
and reports
- Investigate and address
availability, performance, and capacity issues
- Performance and capacity
gaps
- Corrective actions
- Emergency escalation
procedure
That all seems a bit onerous.
The one area where we can immediately increase the size of the bucket is through appropriate management of the resources that we have. For example, has everything been routinely purged and defragmented (ideally, with the white space overwritten)? Maintaining an Exchange database, for example, can provide some additional lifespan. Challenges can emerge from having to demount databases, etc.
Another approach is to not necessarily make the bucket bigger but to get another bucket. For example, one could implement email archiving or use Exchange archiving to improve performance and capacity.
These same challenges apply to file-based storage. Again, proper maintenance might be appropriate and administrators can be proactive in culling collections without necessarily deleting necessary documents and records. Storage reports certainly help (e.g., Windows Server's FSRM reports like Duplicate Files, Large Files, Least Recently Accessed Files, and the File Screening Audit to eliminate files that contravene acceptable use policies -- MP3s, etc.). These documents either shouldn't be on the drives in the first place or is overly resource intensive.
Increase the outflow
Getting rid of stuff is hard. Humans really are terrible information hoarders. People keep more than they need because they tend to over-estimate the potential value of information; they then forget where they stored that information.
So, how can we help? We can encourage our users to clean out their inboxes and empty deleted items. Unfortunately, people lack strategies for actually accomplishing this goal and need some instruction. General guidance could be:
- Sort email by sender. Delete junk. You will probably end up with a list of senders that you actually know because they are colleagues, partners, etc.
- Sort by subject. Delete long meandering threads that have little persistent value.
- Sort by date. Deal with everything older than four months.
In step 3 I used the expression "deal with". What do we do with information that might have value but might not? Some users may elect to export that information to a personal information management tool such as Evernote. Users might also be required to maintain some of that email as records and could export or forward those messages to a a records management system. More likely, they will maintain the email in folders.
Determining a folder structure is inherently difficult due to the nature of categorization -- it is personal and idiosyncratic. The other issue is that people keep information for its use as a reminder. It's perhaps better to keep everything relatively flat and build in some sort of function for reminding.
Dave Allen's Getting Things Done (GTD) productivity technique offers a variety of suggestions for email management but basically adheres to an inbox zero philosophy. It also suggests a minimal filing system:
- Inbox
- @ 1. Action. Anything requiring action goes into this folder. It's basically a to-do list.
- @ 2. Waiting. Anything for which one is waiting for a response goes in here. In some cases, the associated action may have been completed but we can't dispose of it.
- @ 3. A-Z. Information that must be kept goes into his folder. Users can create sub-folders for particular processes or projects. Encourage those users to take a functional approach, that is, file things according to the necessary action. Some users will certainly create elaborate structures that will be largely empty.
- @ 4. Archive. Information that might be necessary goes into the archive. Users can get access details via search. In old-school filing systems, this kind of collection would typically be organized by the name of the correspondence sender. Email clients do this for us automatically.
What's with the weird prefixes? Putting special characters at the beginning of a folder name enables us to group them and put them into some sort of logical order. Otherwise, the folders would be listed alphabetically which may -- or may not -- be of value.
Individual folders can also be associated with specific retention periods. Exchange, for example, now enables the use of specific retention tags to automate disposition. The challenge, of course, is getting the users to actually put email in the right locations!
Digital files also pose problems. Earlier this year, Mimi Dionne wrote a pair of articles about cleaning up file shares for CMSwire (
article 1,
article 2).
The first part of a file share clean up is statistical analysis. Dionne recommends getting the following for each file:
Must haves
- file name
- file extension
- file type
- volume
- size
- creation time
- owner user name
- last access time
- last modified time
- days since last access
- days since last modify
- days since file creation
Nice to have features include:
- attributes
- read only
- hidden
- system flag
- encrypted flag
- non content indexed
- duplicate key string
You can then start mapping the retention schedule to various combinations of keywords and extensions. Typically you will get:
- miscellaneous files (79%)
- container files (10%)
- data files (4.6%)
- text files (4.4%)
- temporary/backup files (1.3%)
- graphic files (0.2%)
- system files (0.2%)
- virtualization files (0.2%)
- database files
- office files and documetns
- program files
- internet files
- software dev files
- video files
- configuration files
- mail files
- audio files
- help files
This information can also be used to determine the relative age of documents and the impact of aggressive file retention periods. For example, what would a shorter retention period due to storage requirements based on historical data?
Study the patterns over time. These observations might encourage better conversations with end users about what should -- and what shouldn't -- be in the file share.
The content categories should identify "easy deletes", objects that are redundant, obsolete, or transitory (ROT). You could win back a quick 1%. Removing duplicates might get you another 2%.
It's also helpful to devide data between administrative and technical functions.
Beyond that, you might need a more sophisticated approach.