Download Insight (PDF) »
The movie Minority Report depicts a future society where a ‘pre-crime’ police department uses the services of a team of savants to arrest murderers moments before they would have committed their deed. Its science fi ction, but modern analytical techniques being applied to corporate email communications may soon be facilitating the spotting of serious impending crimes before they actually take place, bringing the world of Minority Report a step closer.
If you want to see in action the moral decadence that could so easily imperil Western civilisation, and if you also want to enjoy some interesting entertainment at other people’s expense, you could do a lot worse than log on to www.enronexplorer.com. The site provides more than 200,000 emails sent to, from and between Enron employees during the period 1999 to 2001, when Enron finally collapsed.
The little known Federal Energy Regulatory Commission (FERC) of the United States has made these emails public to stimulate research into how corporate email data could be forensically analysed. The Enron Explorer website gives one way of conducting just such a forensic analysis, as well as offering the results of some particularly interesting suggested searches. Its extraordinary how hubris, coupled with an apparent sense that no-one would ever discover what was really happening at Enron, led many executives who were doubtless originally perfectly respectable professionals to commit a whole host of serious errors of judgement and, eventually, crimes.
The Enron emails were investigated with a great deal of laborious forensic work once the authorities had access to Enron’s corporate email records. Increasingly, however, there’s a feeling that the importance of email inside the corporate environment is so great that a more efficient way needs to be found to vet emails.
The very fact that the emails they use at work do constitute a record of such behaviour may be a surprise to many people, not least those who are happy to chronicle their misbehaviour by email in the first place.
A permanent reminder
Many people are unaware that emails can be a permanent record. We may be tempted to think that because emails themselves often seem highly ephemeral, and can be written in a matter of moments, they aren’t a permanent record. But they are. Emails are in most cases as permanent as any written document. With many companies archiving emails for long periods of time, one that may take only a moment to send can survive for a long time.
If the enormous database of Enron emails teaches us anything, it is that people will often write something in an email they would never dream of putting in a formal hard-copy memo. Yet emails and memos are really equivalent and indeed emails are in a sense even more permanent as they can so readily be copied or forwarded electronically.
According to a survey conducted in March 2008 by Proofpoint - a company that offers unifi ed email security and data loss prevention services - over one third of UK businesses with more than 20,000 employees regularly read or otherwise monitor emails going out from their corporate systems. In the US it is over 40%. Of course, the numbers of emails leaving the corporate network are only a small fraction of the total number of emails sent between employees every day. As the vetting process is currently primarily manual it’s not practical to monitor all internal emails as well as looking for the obvious breaches of confidentiality that might occur by sending sensitive information outside the corporate network.
Email searching techniques
So how do you fi nd the needle of incriminating evidence in the haystack of all the innocuous emails circulating within a large business?
In a digital age, it is not surprising that forensic investigators are trying to find ways to use digital techniques to sift through emails more efficiently. It’s logical that in a world where modern internet search techniques are so powerful, it should be possible to devise some way of searching through emails with similar effectiveness. After all, we’ve all done web searches that produce extremely rapid results based on searching for a key word or phrase.
It’s already possible to use modern search tools to locate emails that contain key words or are from or to specific individuals. Any email system administrator can easily use the tools that come as standard on email servers to conduct such a task. These currently available search techniques are proving highly effective in analysing evidence for cases where (as is usually what happens) people still do not try to conceal what they are saying and say the most injudicious things in emails.
For the future there are emerging techniques, using technologies from state-of-the-art, leading-edge developments in areas such as natural language processing, which involve search engines being programmed not only to look at specific words used in a dialogue but also at the way in which the words are used.
The result of these new techniques, if properly deployed, is that meaning can be inferred and relayed to the searcher. Such methods may initially operate in a rather clumsy fashion, but there is little doubt that the programs performing these types of searches will to some extent be capable of self-learning and self-improvement.
Of course, the above techniques work best when you already know that a situation needs detailed forensic analysis. What if you don’t know that a specific crime has taken place but want to see if there is suspicious behaviour underway? This is where a relatively mature technology known as ‘link analysis’ comes in.
Link analysis
Link analysis is where you use the information stored in all corporate email systems that shows who sent a message to whom and when. By studying these patterns you can build up a picture of the interactions between people using the corporate infrastructure, and from there try to deduce who may have been involved in the sort of activities that resulted in the downfall of Enron.
Today, link analysis is being combined with ‘text mining’ (deriving patterns and trends from words used in text through statistical analysis) and ‘data visualisation’ (presenting the results of any data analysis in a graphical or diagrammatic form rather than as a list of numbers or further text) to try to find signs of errant behaviour. These techniques will surely become more refined in time. Ideally, future Enrons will be nipped in the bud.
Certainly, applying these techniques retroactively to the Enron dataset shows beyond doubt that the behaviour that shocked the world could have been spotted early on if the right technology had been monitoring Enron’s email system.
However, if the executives within a business are the very people committing the crime then one is bound to ask:
-
Who exactly should be carrying out the analysis and monitoring the findings?
-
As the emerging technology begins to show who may be about to commit a criminal act, when should they be challenged?
-
Should users of corporate email systems be allowed to treat their emails as private? Most companies of any size these days have a policy that clearly states that any email sent using the company system is subject to scrutiny. But how many people realise this, and would they start to object if they felt there was active monitoring of all emails rather than just retrospective analysis of ‘incidents’?
The technology to vet emails successfully from a range of powerful perspectives is coming. Indeed, some of it is already in place. And, as is so often the case, company executives, the law, and regulatory bodies, will need to ensure that they are not only keeping up with the technology but are, in a very real sense, ahead of it.
Otherwise, make no mistake, the bad guys - and girls - are going to win.
This article appeared in Commercial Crime International in September 2008.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.