Download Insight (PDF) »
A picture, they say, is worth a thousand words. But businesses wanting to guard against the threat of vital data being deliberately leaked to unauthorised people outside, or even inside, the organisation, need to get to grips with the alarming reality that a picture can also conceal a thousand words. This article appeared in Lloyds List Newspaper on 14th of September 2007.
Or in some cases, even up to around 5,000 words. More than enough to betray all your most precious and commercially sensitive data — locations of newly-discovered oil fields; formulae for synthesising newly discovered molecules of breakthrough drugs costing millions or even billions to develop; designs of revolutionary products you are planning on being the first to bring to market; ultra-sensitive lists of hard-won customers; you name it.
The idea of data concealed in pictures might sound like the plot of the next Mission Impossible movie, but it is not. And unless you are prepared to let any Tom, Dick or Harry cruise around your precious data, you need to be aware of the threat it poses. The technique used is called steganography, from the ancient Greek meaning hidden or covered writing — just as the stegosaurus was named because its back was covered in bony plates, whose real purpose is a mystery even today.
But steganography was not a mystery to the Ancient Greeks, indeed they most likely invented it. The Greek historian Herodotus records that in 312 BC, Histaeus of Miletus commanded the head of his most trusted slave to be shaved and tattooed with a vitally important secret message on it. Once the slave’s hair had grown, hiding the message, Histaeus sent him as an emissary to a friendly power via enemy territory to instigate a revolt against the Persians.
This example from history shows that steganographic writing can be a dangerous threat to security. Friends who betray us are always a more potent threat than people we recognise as enemies from the outset, and steganographic messages look friendly and innocent. You could devise a simple steganographic message by agreeing with your recipient that your real message will consist of the first letter of every word of your apparent message.
“Bring us your invoice by Monday”, for example, would really mean “BUY IBM”. In steganographic writing the apparent message is known as the cover text and the real message is called the plain text. The innocuous appearance of the cover text in the example illustrates why steganographic writing does not tend to set alarm bells ringing. It looks innocent, whereas a message“BUY IBM” encrypted in a simple code that consisted, say, of substituting each letter for the next letter in the alphabet — “CVZ JCN” — obviously looks suspect and would be certain to awaken the suspicions of even the most credulous member of an industrial espionage prevention team.
The point is that any encrypted message will tend to raise suspicions because even though it cannot easily be read you will know it has been encrypted and will instantly conclude that something fishy is going on.
In modern business, the threat of steganography has recently become a major issue in corporate life. It’s actually been a significant threat for several years as computing power available on the desktop has increased. But users have been distracted by publicity about cryptography, and steganography has rather remained in the background. It is a particularly worrying threat now because of the the massive volume of electronic communications, and the number of freely available tools that allow even a routine user to employ steganographic techniques.
By far the biggest type of threat is the potential for concealing steganographic writing within computerised images. In Microsoft Windows you can literally drag and drop your hidden text onto a picture and the deed is done.
Information remains the most valuable commodity and it is precisely that which can so easily be given away or sold using image-based steganographic techniques. What is actually happening when you carry out what looks like a simple drag and drop? An electronic image is comprised of thousands of ‘picture elements’ or ‘pixels’. A pixel is a binary number that provides information on the colour or the shade of grey that should be displayed in that particular pixel. The binary number will look something like 10011011, depending on the pixel in question. The individual numbers (the 1 or the 0) are known as ‘bits’ and the further along you go to the right, the less significant the bits become in defining the precise colour of the pixel.
Why does the opportunity for steganography exist? Because while each pixel is defined by a series of bits, some of these bits can be changed without affecting the resulting pixel to any discernible extent. In a computerised image whose size is 256 by 256 pixels, making a total of 65,536 pixels, there would easily be room to conceal, say, about 5,000 words of data. This method of concealment is known rather quaintly as ‘bit twiddling’. An obvious place to conceal a secret message would be within a computerised picture that does not show any apparent changes.
Bit twiddling is the most common way to conceal text within a computerised image. There are many more techniques, though, particularly when using image formats such as the now ubiquitous jpeg which many will have encountered through their digital cameras. So what is the best way to guard against image-based steganographic betrayal?
The first step is to recognise that it is a potential problem and get help to understand what tools are likely to be available to a malicious team member. You also need to know the manner in which these tools can be used because they often leave little trace of their presence. Some are even termed ‘zero footprint’ by those who develop them.
But help is at hand because dedicated teams of experts have been making available tools to help detect steganography. The technique they use is known as ‘steganalysis’. Steganalysis is as much an art as a science. The detection tools need to be deployed so that the appropriate steganalysis resource is used in the appropriate situation. Admittedly, this is not easy, when the range of steganography tools and the steganalysis counterparts have proliferated and are proliferating just as the threat from viruses did when they first emerged into the IT environment.
Charteris began its own anti-steganography work as a technical exercise but soon became alarmed at what its experiments were showing: not just about the power of the steganography tools available, but also about the degree of care that needs to be applied to combat this potent security hazard.
Taking the threat of betrayal by apparently innocuous pixels seriously will lead you to put into practice the measures necessary to defend against it. And you do need to take this threat very seriously indeed.
The stegosaurus may be long extinct, but steganographic treachery is, unfortunately, here to stay.
This article first appeared in Lloyds List, 14 September 2007.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level
1. INTRODUCTION
Application Integration is the biggest cost driver of corporate IT. While it has been popular to
emphasise the business process integration aspects of EAI, it remains true that data integration is a
huge part of the problem, responsible for much of the cost of EAI. You cannot begin to do process
integration without some data integration.
Data integration is an N-squared problem. If you have N different systems or sources of data to
integrate, you may need to build as many as N(N -1) different data exchange interfaces between them –
near enough to N2. For large companies, where N may run into the hundreds, and N2 may be more
than 100,000, this looks an impossible problem.
In practice, the figures are not quite that huge. In our experience, a typical system may interface to
between 5 and 30 other systems – so the total number of interfaces is between 5N and 30N. Even this
makes a prohibitive number of data interfaces to build and maintain. Many IT managers quietly admit
that they just cannot maintain the necessary number of data interfaces, because the cost would be
prohibitive. Then business users are forced to live with un-integrated, inconsistent data and fragmented
processes, at great cost to the business.
The bad news is that N just got bigger. New commercial imperatives, the rise of e-commerce, XML
and web services require companies of all sizes to integrate data and processes with their business
partners’ data and processes. If you make an unsolved problem bigger, it generally remains unsolved.
Users and software vendors have devoted huge efforts to tackling the N2 data integration problem.
The solutions available today can be grouped into four main levels of increasing sophistication and
power:
1. Hand coding of data interfaces
2. Source-to-target mapping and translation tools
3. Integration hubs and brokers
4. Full model-based integration
This article discusses the costs and benefits you can expect at each level.