The bucket model of information management
Reduce the inflow
This goal might be difficult to attain. In general, humans are information pack rats. We keep more than we know what to do with because we have a hard time of determining long-term value of information... and then we forget about it.
There might be localized opportunities for improvement. With email, for example, we can ensure that we have targeted spam filtering, etc. Managing email might involve a layered approach:
- People. Convince people to use email effectively and collect less crap. For example, "reply alls" with large attachments might not be such a good thing. It might also be valuable to assist people with actually getting stuff done. Email, for example, might not be the best way to manage certain types of documents or records.
- Process. Maintain appropriate email hygiene /w black lists, etc. Another process could be to communicate back with users to tell them how much email they're getting, and how to get rid of that email that they don't want. This exercise could be tied to general ediscovery for acceptable use exercises or it could rely on reporting (e.g., who has a lot of unopened email in their inbox).
- Technology. Get better spam filters or general email security gateway tools. Over 70% of inbound email is illegitimate so managing that inflow is important.
Increase the size of the bucket
It's the brute force approach. Gear up: more storage, more servers, etc. A bigger bucket will take a longer time to fill up... but it will fill up. There are a few different things that are required to increase the bucket. The first is really the effectiveness of our capacity management strategy. Ah hoc IT shops might not really have anything in place for capacity management so any expansion will come as a surprise. COBIT's BAI04 gives us a run-down on what we should, ideally, have in place for capacity management:
- Manage availability and capacity
- Assess current availability, performance and capacity and create a baseline
- Availability, performance and capacity baselines
- Evaluations against SLAs
- Assess business impact
- Availability, performance and capacity scenarios
- Availability, performance and capacity business impact assessments
- Plan for new or changed service requirements
- Prioritized improvements
- Performance and capacity plans
- Monitor and review availability and capacity
- Availability, performance and reports
- Investigate and address availability, performance, and capacity issues
- Performance and capacity gaps
- Corrective actions
- Emergency escalation procedure
That all seems a bit onerous.
The one area where we can immediately increase the size of the bucket is through appropriate management of the resources that we have. For example, has everything been routinely purged and defragmented (ideally, with the white space overwritten)? Maintaining an Exchange database, for example, can provide some additional lifespan. Challenges can emerge from having to demount databases, etc.
Another approach is to not necessarily make the bucket bigger but to get another bucket. For example, one could implement email archiving or use Exchange archiving to improve performance and capacity.
These same challenges apply to file-based storage. Again, proper maintenance might be appropriate and administrators can be proactive in culling collections without necessarily deleting necessary documents and records. Storage reports certainly help (e.g., Windows Server's FSRM reports like Duplicate Files, Large Files, Least Recently Accessed Files, and the File Screening Audit to eliminate files that contravene acceptable use policies -- MP3s, etc.). These documents either shouldn't be on the drives in the first place or is overly resource intensive.
Increase the outflow
Getting rid of stuff is hard. Humans really are terrible information hoarders. People keep more than they need because they tend to over-estimate the potential value of information; they then forget where they stored that information.
So, how can we help? We can encourage our users to clean out their inboxes and empty deleted items. Unfortunately, people lack strategies for actually accomplishing this goal and need some instruction. General guidance could be:
- Sort email by sender. Delete junk. You will probably end up with a list of senders that you actually know because they are colleagues, partners, etc.
- Sort by subject. Delete long meandering threads that have little persistent value.
- Sort by date. Deal with everything older than four months.
- @ 1. Action. Anything requiring action goes into this folder. It's basically a to-do list.
- @ 2. Waiting. Anything for which one is waiting for a response goes in here. In some cases, the associated action may have been completed but we can't dispose of it.
- @ 3. A-Z. Information that must be kept goes into his folder. Users can create sub-folders for particular processes or projects. Encourage those users to take a functional approach, that is, file things according to the necessary action. Some users will certainly create elaborate structures that will be largely empty.
- @ 4. Archive. Information that might be necessary goes into the archive. Users can get access details via search. In old-school filing systems, this kind of collection would typically be organized by the name of the correspondence sender. Email clients do this for us automatically.
The first part of a file share clean up is statistical analysis. Dionne recommends getting the following for each file:
- file name
- file extension
- file type
- creation time
- owner user name
- last access time
- last modified time
- days since last access
- days since last modify
- days since file creation
Nice to have features include:
- read only
- system flag
- encrypted flag
- non content indexed
- duplicate key string
You can then start mapping the retention schedule to various combinations of keywords and extensions. Typically you will get:
- miscellaneous files (79%)
- container files (10%)
- data files (4.6%)
- text files (4.4%)
- temporary/backup files (1.3%)
- graphic files (0.2%)
- system files (0.2%)
- virtualization files (0.2%)
- database files
- office files and documetns
- program files
- internet files
- software dev files
- video files
- configuration files
- mail files
- audio files
- help files
This information can also be used to determine the relative age of documents and the impact of aggressive file retention periods. For example, what would a shorter retention period due to storage requirements based on historical data?
Study the patterns over time. These observations might encourage better conversations with end users about what should -- and what shouldn't -- be in the file share.
The content categories should identify "easy deletes", objects that are redundant, obsolete, or transitory (ROT). You could win back a quick 1%. Removing duplicates might get you another 2%.
It's also helpful to devide data between administrative and technical functions.
Beyond that, you might need a more sophisticated approach.