Facetation: The bucket model of information management

I've been beating around the bush on a few different issues here. So what does my model look? We could talk about information lifecycles but I really want to talk about buckets, specifically this one:

Basically, we need to do one thing: stop the bucket from overflowing. To do this, we have three different options:

Reduce the inflow
This goal might be difficult to attain. In general, humans are information pack rats. We keep more than we know what to do with because we have a hard time of determining long-term value of information... and then we forget about it.

There might be localized opportunities for improvement. With email, for example, we can ensure that we have targeted spam filtering, etc. Managing email might involve a layered approach:

People. Convince people to use email effectively and collect less crap. For example, "reply alls" with large attachments might not be such a good thing. It might also be valuable to assist people with actually getting stuff done. Email, for example, might not be the best way to manage certain types of documents or records.
Process. Maintain appropriate email hygiene /w black lists, etc. Another process could be to communicate back with users to tell them how much email they're getting, and how to get rid of that email that they don't want. This exercise could be tied to general ediscovery for acceptable use exercises or it could rely on reporting (e.g., who has a lot of unopened email in their inbox).
Technology. Get better spam filters or general email security gateway tools. Over 70% of inbound email is illegitimate so managing that inflow is important.

As for regular files, the inflow problem is certainly a challenge. It could be a matter of shifting user's perceptions to what should be kept and what shouldn't be kept. For example, can you help users clearly identify records where there is a retention requirement? Another issue might be personal information management. For example, do we really want people clogging up shares with their personal stuff? Do we advocate for something a bit more personal. For example, users could have a personal drive that syncs with a local drive. It could be the place we put all documentation. Public shares are then the location for information with a clear retention period (i.e., project documentation). Another strategy could be the use of cloud services. For example, Evernote does a great job of off-loading the storage requirement for personal documents of uncertain value. It also, however, introduces the possibility of data leakage or loss.

Increase the size of the bucket

It's the brute force approach. Gear up: more storage, more servers, etc. A bigger bucket will take a longer time to fill up... but it will fill up. There are a few different things that are required to increase the bucket. The first is really the effectiveness of our capacity management strategy. Ah hoc IT shops might not really have anything in place for capacity management so any expansion will come as a surprise. COBIT's BAI04 gives us a run-down on what we should, ideally, have in place for capacity management:

Manage availability and capacity

Assess current availability, performance and capacity and create a baseline

Availability, performance and capacity baselines
Evaluations against SLAs

Assess business impact

Availability, performance and capacity scenarios
Availability, performance and capacity business impact assessments

Plan for new or changed service requirements

Prioritized improvements
Performance and capacity plans

Monitor and review availability and capacity

Availability, performance and reports

Investigate and address availability, performance, and capacity issues

Performance and capacity gaps
Corrective actions
Emergency escalation procedure

That all seems a bit onerous.

The one area where we can immediately increase the size of the bucket is through appropriate management of the resources that we have. For example, has everything been routinely purged and defragmented (ideally, with the white space overwritten)? Maintaining an Exchange database, for example, can provide some additional lifespan. Challenges can emerge from having to demount databases, etc.

Another approach is to not necessarily make the bucket bigger but to get another bucket. For example, one could implement email archiving or use Exchange archiving to improve performance and capacity.

These same challenges apply to file-based storage. Again, proper maintenance might be appropriate and administrators can be proactive in culling collections without necessarily deleting necessary documents and records. Storage reports certainly help (e.g., Windows Server's FSRM reports like Duplicate Files, Large Files, Least Recently Accessed Files, and the File Screening Audit to eliminate files that contravene acceptable use policies -- MP3s, etc.). These documents either shouldn't be on the drives in the first place or is overly resource intensive.

Increase the outflow

Getting rid of stuff is hard. Humans really are terrible information hoarders. People keep more than they need because they tend to over-estimate the potential value of information; they then forget where they stored that information.

So, how can we help? We can encourage our users to clean out their inboxes and empty deleted items. Unfortunately, people lack strategies for actually accomplishing this goal and need some instruction. General guidance could be:

Sort email by sender. Delete junk. You will probably end up with a list of senders that you actually know because they are colleagues, partners, etc.
Sort by subject. Delete long meandering threads that have little persistent value.
Sort by date. Deal with everything older than four months.

In step 3 I used the expression "deal with". What do we do with information that might have value but might not? Some users may elect to export that information to a personal information management tool such as Evernote. Users might also be required to maintain some of that email as records and could export or forward those messages to a a records management system. More likely, they will maintain the email in folders.

Determining a folder structure is inherently difficult due to the nature of categorization -- it is personal and idiosyncratic. The other issue is that people keep information for its use as a reminder. It's perhaps better to keep everything relatively flat and build in some sort of function for reminding.

Dave Allen's Getting Things Done (GTD) productivity technique offers a variety of suggestions for email management but basically adheres to an inbox zero philosophy. It also suggests a minimal filing system:

Inbox
@ 1. Action. Anything requiring action goes into this folder. It's basically a to-do list.
@ 2. Waiting. Anything for which one is waiting for a response goes in here. In some cases, the associated action may have been completed but we can't dispose of it.
@ 3. A-Z. Information that must be kept goes into his folder. Users can create sub-folders for particular processes or projects. Encourage those users to take a functional approach, that is, file things according to the necessary action. Some users will certainly create elaborate structures that will be largely empty.
@ 4. Archive. Information that might be necessary goes into the archive. Users can get access details via search. In old-school filing systems, this kind of collection would typically be organized by the name of the correspondence sender. Email clients do this for us automatically.

What's with the weird prefixes? Putting special characters at the beginning of a folder name enables us to group them and put them into some sort of logical order. Otherwise, the folders would be listed alphabetically which may -- or may not -- be of value.

Individual folders can also be associated with specific retention periods. Exchange, for example, now enables the use of specific retention tags to automate disposition. The challenge, of course, is getting the users to actually put email in the right locations!

Digital files also pose problems. Earlier this year, Mimi Dionne wrote a pair of articles about cleaning up file shares for CMSwire (article 1, article 2).

The first part of a file share clean up is statistical analysis. Dionne recommends getting the following for each file:

Must haves

file name
file extension
file type
volume
size
creation time
owner user name
last access time
last modified time
days since last access
days since last modify
days since file creation

Nice to have features include:

attributes
read only
hidden
system flag
encrypted flag
non content indexed
duplicate key string

You can then start mapping the retention schedule to various combinations of keywords and extensions. Typically you will get:

miscellaneous files (79%)
container files (10%)
data files (4.6%)
text files (4.4%)
temporary/backup files (1.3%)
graphic files (0.2%)
system files (0.2%)
virtualization files (0.2%)
database files
office files and documetns
program files
internet files
software dev files
video files
configuration files
mail files
audio files
help files

This information can also be used to determine the relative age of documents and the impact of aggressive file retention periods. For example, what would a shorter retention period due to storage requirements based on historical data?

Study the patterns over time. These observations might encourage better conversations with end users about what should -- and what shouldn't -- be in the file share.

The content categories should identify "easy deletes", objects that are redundant, obsolete, or transitory (ROT). You could win back a quick 1%. Removing duplicates might get you another 2%.

It's also helpful to devide data between administrative and technical functions.

Beyond that, you might need a more sophisticated approach.

Facetation

Friday, December 12, 2014

The bucket model of information management

0 Comments:

About Me

Previous Posts