How To: Use Zero-Width Characters to Hide Secret Messages in Text (& Even Reveal Leaks)

Use Zero-Width Characters to Hide Secret Messages in Text (& Even Reveal Leaks)

You may be familiar with image-based or audio-based steganography, the art of hiding messages or code inside of pictures, but that's not the only way to conceal secret communications. With zero-width characters, we can use text-based steganography to stash hidden information inside of plain text, and we can even figure out who's leaking documents online.

Image- and audio-based steganography has been covered several times on Null Byte, which involves changing the least significant digit of individual pixels on a photo or audio file. While plain text characters don't have a least significant digit that we can manipulate in the same fashion, we can still use Unicode to our advantage. Unicode is the standardized encoding format for text, specifically, UTF-8, that most web browsers use for text.

Because Unicode needs to support almost all written languages in the world, there are some counterintuitive characters such as zero-width non-joiners and zero-width spaces. For example, the zero-width non-joiner is used in languages such as Persian, where it's needed to display the correct typographic form of words.

In that image, notice how the line is no longer continuous? That's what is meant by a non-joiner. However, for our purposes, the most important part about these character types is that they're not needed in English and aren't normally displayed.

That fact allows us to pick two arbitrary zero-width characters and designate them as one and zero. We can then hide any message in plain text by splitting it into single characters and encoding it in binary with zero-width characters acting as the ones and zeros. The best practice is to add the zero-width binary code in the spaces between words. Otherwise, spellcheckers tend to think the word is misspelled.

Plain ‌​‌​‌​​​‍‌​​‌​‌‌‌‍‌​​‌​​​​‍‌​​‌‌‌‌​‍‌‌​‌​​‌‌‍‌‌​‌‌‌‌‌‍‌​​​‌​​​‍‌​​‌​‌‌‌‍‌​​‌‌‌‌​‍‌​​​‌​‌‌‍‌‌​‌‌‌‌‌‍‌​​‌​‌​​‍‌​​‌​‌‌​‍‌​​‌​​​‌‍‌​​‌‌​‌‌‍‌​​‌‌‌‌​‍‌‌​‌‌‌‌‌‍‌​​‌​​​‌‍‌​​‌​‌‌​‍‌​​‌​​​‌‍‌​​‌​‌​‌‍‌​​‌‌‌‌​‍‌‌​‌‌‌‌‌‍‌​​​‌‌​​‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌​‌‌‍‌‌​‌‌‌‌‌‍‌​​‌​‌‌​‍‌​​​‌‌​​‍‌‌​‌‌‌‌‌‍‌​​​‌​‌‌‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌‌​​‍‌‌​​​​​​‍‌‌​‌‌‌‌​‍‌‌​​​​​​‍‌‌​‌‌‌‌​text, nothing to see here

To see the concept in action, copy the text "plain text" below and paste it an online zero-width detention tool to see what it says.

What Can I Use It For?

The ability to hide messages in otherwise ordinary-looking text is useful on its own, but what makes the technique really nifty is the fact that it also survives reformatting and goes wherever the text is copied and pasted. The hidden characters don't even show up in text editors like nano.

The most apparent use of the technique is as a means of covert communication. You can use the classic spy trick of publishing an article or some type of text document in a public space. For example, you could hide a secret message in a Craigslist ad, then have an individual recipient or group of people periodically check local Craigslist ads for a specific keyword. They would know to check the description for hidden zero-width character messages.

It's a beneficial approach to communicating when it's imperative that the two individuals not be seen having direct contact.

A slightly more sophisticated implementation would be the age-old canary trap. If you've ever read any mystery novel, you may be familiar with how the trap works. When you're suspicious that individuals are leaking information, you go to each person and give them slightly different info, and then you wait for that info to appear where it shouldn't be. Based on the version, you'd know which individual leaked the info.

Congressional aides are known to leak data by using intentionally misspelled words or other small grammatical errors in documents provided to other offices and lobbyists. The problem with doing that, though, is that if anyone can get a hold of two different versions, they immediately know what you're up to. That's where zero-width characters can come to the rescue.

By using the zero-width characters, the average user is far less likely to notice, even if they do get a hold of two versions of the document or text. And don't think that someone could get off scot-free merely by taking a screenshot of the text or photocopying it on a printer. Both of those come with their own challenges, which could be even more damning, such as the EXIF data that point right back to your device and the microdots employed by many printers that uniquely identify pages printed on them.

Lastly, zero-width characters can be used to change URLs. Unfortunately, you can't register a domain name with zero-width characters in it due to ICANN rules. However, it can still be quite useful in homograph attacks on a local network. It can also just be used to break a URL. Just look at my GitHub URLs below.

https://github.com/holdTheDoorHoid
https://github.com/hold​TheDoorHoid

They may look the same, but the second one has a zero-width character after "hold," which prevents it from working correctly.

Now, let's see how to start actually using the technique.

Using Real Encryption for Extra Security

I cannot stress this enough: encoding a message into zero-width binary code is not sufficient encryption if you want to secure your messages. Anyone with the right program can easily decipher your messages without another form of encryption. The most basic thing you can do is alter the source code of whichever program you are using to use different zero-width characters as the one and zero. I'll go over how to do so in a later step.

But what you should really be using is a symmetric or asymmetric encryption scheme. An asymmetric encryption scheme like PGP will work best if only one person is the intended recipient. However, you may want to use symmetric encryption if the messages are designed for a group of people. Either way, you can then share a key beforehand and have a much more secure form of communicating.

If you chose to use PGP, make sure to read our guide to using PGP. And if you select the symmetric route, check out how to use EncryptPad.

My favorite tactic is based on the theory that no one searches for a secret compartment within a secret compartment. What do I mean by that? Instead of using PGP to encrypt a zero-width message, use zero-width characters within your PGP encrypted emails as an extra layer of verification.

You could use this in two ways. The first way would be to have a generic email body with the actual email being hidden within zero-width characters. The second approach would be to hide a particular codeword in the first sentence, then have the person responding use that codeword or a response codeword in their first sentence. I like the second tactic because if anyone does get a hold of the PGP key of the person you're corresponding with, it's highly unlikely that they'll think to look for the zero-width characters, and you'll immediately know they're compromised when they respond without the code.

Option 1: Create Zero-Width Messages on the Web

To get started sending zero-width messages, open up the Steganographr webpage and paste your (preferably encrypted) message in the "Private Message" field and a generic or otherwise benign message in the "Public Message" field, then click "Steganographize."

You'll then be able to copy the new message and do with it as you will. As long as it is copied and pasted, it will still have the hidden message intact.

When the message needs to be revealed, you can use the Steganographr webpage for the too. On the site, scroll to the bottom, paste the text into the "Reveal Private Message" field, and click "Desteganographize."

However, one problem with the online method is that we have to trust that the website isn't saving these messages or doing anything else nefarious. Luckily, the source code is provided for Steganographr, so we can simply copy it and host it on our own website if we want to go the extra mile.

Additionally, the source code allows us to edit it. The most useful thing to edit is which characters are used to represent spaces, zeros, and ones in our zero-width binary. To make the change, look for the "bin2hidden" and "hidden2bin" definitions; we just need to change the hex values to the new characters we want.

// Convert the ones, zeros, and spaces of the hidden binary data to their respective zero-width characters
function bin2hidden($str) {
    $str = str_replace(' ', "\xE2\x81\xA0", $str); // Unicode Character 'WORD JOINER' (U+2060) 0xE2 0x81 0xA0
    $str = str_replace('0', "\xE2\x80\x8B", $str); // Unicode Character 'ZERO WIDTH SPACE' (U+200B) 0xE2 0x80 0x8B
    $str = str_replace('1', "\xE2\x80\x8C", $str); // Unicode Character 'ZERO WIDTH NON-JOINER' (U+200C) 0xE2 0x80 0x8C
    return $str;
}

// Convert zero-width characters to hidden binary data
function hidden2bin($str) {
    $str = str_replace("\xE2\x81\xA0", ' ', $str); // Unicode Character 'WORD JOINER' (U+2060) 0xE2 0x81 0xA0
    $str = str_replace("\xE2\x80\x8B", '0', $str); // Unicode Character 'ZERO WIDTH SPACE' (U+200B) 0xE2 0x80 0x8B
    $str = str_replace("\xE2\x80\x8C", '1', $str); // Unicode Character 'ZERO WIDTH NON-JOINER' (U+200C) 0xE2 0x80 0x8C
    return $str;
}

Below is a list of useful zero-width characters and their hex codes.

For example, to change from a zero-width non-joiner to a Mongolian vowel separator, we would replace the hex code like so:

str_replace('1', "\xE1\xA0\x8E", $str)

Be sure to change the characters' hex codes the same way in both definitions. That way, it can adequately encode and decode.

Option 2: Create Zero-Width Messages from the Command Line

Opening up a webpage every time you want to send and decode a message can get annoying, particularly if you're doing it often enough. Luckily, someone has already created a tool on GitHub, called ZWFP, for encoding zero-width messages, and it works from the command line for extra hacker coolness and convenience.

One thing to note, however, is that you won't be able to change the characters being used for the zero-width binary easily. It isn't really anything to worry about as long as you encrypt your message beforehand. But you will not be able to have one person encoding using the website and one decoding using the command line or vice-versa. They use different characters, so be sure that you're using the same one to encode and decode.

To start, make sure you have the Go programming language installed on your computer. If you don't have it already, you can visit Go's downloads page to see its options for all major operating systems. The site also has detailed install instructions, which you should not skip if it's the first time using Go, as you need to change the PATH environment variable.

With Go installed, you can fetch the GitHub repository for ZWFP from your working Go directory.

~/go$ go get -u github.com/vedhavyas/zwfp/cmd/zwfp/...

The files are so small they'll be done downloading practically before you even press Enter. There will be no indication that you have it except a new command prompt, but it should be there. Now, move to the deepest "zwfp" folder, then build the tool.

~/go$ cd /go/src/github.com/vedhavyas/zwfp/cmd/zwfp
~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ go build

Again, nothing will say it worked except you'll get a new command prompt. From here, let's run the test tool.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ go test ./...

?       github.com/vedhavyas/zwfp/cmd/zwfp      [no test files]

Now, let's see how to use ZWFP. Unfortunately, its usage notes are pretty vague, so let's dive a little deeper into the correct way to use the tool.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ ./zwfp

Usage:
         ./zwfp CoverText Payload
                 Embeds Payload into CoverText

         ./zwfp SteganoText
                 Extracts Payload from SteganoText

First, let's encode a message with ZWFP using the two arguments. There's the plain-looking cover text, which is what everyone will see, then the hidden message. If we just want "CoverText" to be the cover text, and "Payload" to be the hidden message, it'd look like:

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ ./zwfp CoverText Payload

CoverText

However, you'll need to save it inside a file, which you would likely want to do anyway. So, let's try a different cover text and payload out.

Let's go with "You see this right?" for the cover and "But this is a hidden message in that text." as the payload. (If you're using more than one word for each argument, the sets need to be in quotation marks.) Then, let's save it to a new file in our /Desktop directory. For our example, that would be a new "secret.txt" file.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ ./zwfp "You see this right?" "But this is a hidden message in that text." > /home/kali/Desktop/secret.txt

To see what the text file looks like, let's use nano to view it.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ nano /home/kali/Desktop/secret.txt

  GNU nano 4.9.2                /home/kali/Desktop/secret.txt
You see this right?

                                    [ Read 1 line ]
^G Get Help   ^O Write Out  ^W Where Is   ^K Cut Text   ^J Justify    ^C Cur Pos
^X Exit       ^R Read File  ^\ Replace    ^U Paste Text ^T To Spell   ^_ Go To Line

Great, the cover text is clearly visible. Now, let's try to see what's going on underneath the cover text. For that, we can use vim to see the hidden characters. As you can see below, it's a lot different than what nano showed up. Type :qa and hit Enter to quit vim.

Then, what ZWFP's usage doesn't tell you is how to decode the message. To do so, we'll cat the file, then use xargs -0 to view the hidden contents.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ cat /home/kali/Desktop/secret.txt | xargs -0 ./zwfp

Cover Text: You see this right?

Payload: But this is a hidden message in that text.

To understand what xargs -0 did, let's view parts of xargs man page. The below output is condensed to just what you need to see about xargs and the -0 flag.

~/go/src/github.com/vedhavyas/zwfp/cmd/zwfp$ man xargs

XARGS(1)                     General Commands Manual                    XARGS(1)

NAME
       xargs - build and execute command lines from standard input

SYNOPSIS
       xargs [options] [command [initial-arguments]]

DESCRIPTION
       This  manual  page documents the GNU version of xargs.  xargs reads items
       from the standard input, delimited by blanks (which can be protected with
       double  or  single  quotes  or a backslash) or newlines, and executes the
       command (default is /bin/echo) one or more times with  any  initial-argu‐
       ments  followed  by  items  read from standard input.  Blank lines on the
       standard input are ignored.

       The command line for command is built up until it  reaches  a  system-de‐
       fined  limit (unless the -n and -L options are used).  The specified com‐
       mand will be invoked as many times as necessary to use up the list of in‐
       put  items.   In general, there will be many fewer invocations of command
       than there were items in the input.  This will normally have  significant
       performance benefits.  Some commands can usefully be executed in parallel
       too; see the -P option.

       Because Unix filenames can contain blanks and newlines, this default  be‐
       haviour is often problematic; filenames containing blanks and/or newlines
       are incorrectly processed by xargs.  In these situations it is better  to
       use the -0 option, which prevents such problems.   When using this option
       you will need to ensure that the program which  produces  the  input  for
       xargs  also uses a null character as a separator.  If that program is GNU
       find for example, the -print0 option does this for you.

       If any invocation of the command exits with a status of 255,  xargs  will
       stop  immediately without reading any further input.  An error message is
       issued on stderr when this happens.

OPTIONS
       -0, --null
              Input items are terminated by  a  null  character  instead  of  by
              whitespace,  and  the  quotes and backslash are not special (every
              character is taken literally).  Disables the end of  file  string,
              which is treated like any other argument.  Useful when input items
              might contain white space, quote marks, or backslashes.   The  GNU
              find -print0 option produces input suitable for this mode.

Revealing Zero-Width Characters with a Chrome Extension

If you want to defend yourself against zero-width characters, the best way to do so is with a simple Chrome browser extension. The extension we'll be using will replace zero-width characters with various emojis. Go to the Chrome Web Store and add "Replace zero-width characters with emojis" to your browser. After that, restart Chrome.

Unfortunately, the extension won't run automatically, so you'll need to check each page when you're suspicious or curious. To run the extension, click its button (an "R" inside a gray square) to the right of your browser bar, and then click "Show me the $!"

The tool makes it fairly obvious when a document or webpage in Chrome has hidden messages using zero-width characters. See all of those emojis below. Those aren't there on the foreground, but they show that there are hidden characters in the background.

Interestingly, since it does a one-to-one replacement of the zero-width characters to various emojis, you can see the binary nature in emoji form. If you want to explore or modify the tool, you can find the source code on GitHub.

As useful as an extension can be, don't let it lure you into a false sense of security. It only detects zero-width characters within a webpage or document and not those that might be lurking in the URL.

Copying Text Without the Zero-Width Characters

If the Chrome extension doesn't work for you, or you want a more powerful tool that will allow you to remove the zero-width characters in a document, then you'll need to use a website called Diffchecker.

Diffchecker is a tool designed to highlight the differences between two documents that are similar, which makes it a potent tool for detecting the classic canary trap. Since not everyone knows about zero-width characters, people will sometimes try to use extra spaces and intentional misspellings to achieve the same effect. Diffchecker will both highlight these differences, if you have two versions of the document, and reveal zero-width characters as dots if you have just one version.

Simply open the website and copy and paste the text into the "Original Text" field. If there are any zero-width characters in the document, they'll start showing up as dots, usually grouped if it's some kind of message, like a username. If you have a second version, paste in the "Changed Text" field, then click "Find Difference" at the bottom of the page. Diffchecker is also perfect for finding zero-width characters in URLs and should be a go-to when investigating suspicious domains.

If you find zero-width characters, removing them is a simple matter of highlighting the block of dots and pressing Backspace or Delete. Now, you can then safely copy the text. As you can see below, decryptors will no longer be able to find a hidden message.

If you do happen to get hold of two versions the same document where they're trying to use the canary trap, It's possible to use the other version as a scapegoat and point the finger at its owner so that the document can't be traced back to you.

If you prefer something on your local computer, then you can use a program called Notepad++, which can be installed on Kali and other Linux distros. In Notepad++, copy and paste the text in, then on the top bar, click "Encoding," then "Encode in ANSI."

Confidential Announcement: ‌​‌​‌​​​‍‌​​‌​‌‌‌‍‌​​‌​​​​‍‌​​‌‌‌‌​‍‌‌​‌​​‌‌‍‌‌​‌‌‌‌‌‍‌​​​‌​​​‍‌​​‌​‌‌‌‍‌​​‌‌‌‌​‍‌​​​‌​‌‌‍‌‌​‌‌‌‌‌‍‌​​‌​‌​​‍‌​​‌​‌‌​‍‌​​‌​​​‌‍‌​​‌‌​‌‌‍‌​​‌‌‌‌​‍‌‌​‌‌‌‌‌‍‌​​‌​​​‌‍‌​​‌​‌‌​‍‌​​‌​​​‌‍‌​​‌​‌​‌‍‌​​‌‌‌‌​‍‌‌​‌‌‌‌‌‍‌​​​‌‌​​‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌​‌‌‍‌‌​‌‌‌‌‌‍‌​​‌​‌‌​‍‌​​​‌‌​​‍‌‌​‌‌‌‌‌‍‌​​​‌​‌‌‍‌​​‌​‌‌‌‍‌​​‌​‌‌​‍‌​​​‌‌​​‍‌‌​​​​​​‍‌‌​‌‌‌‌​‍‌‌​​​​​​‍‌‌​‌‌‌‌​This is some confidential text that you really shouldn't be sharing anywhere else.

By changing the encoding of the document, you essentially break the zero-width characters, which rely on Unicode and UTF-8. The zero-width characters now look like complete gibberish and can easily be found when reading a document.

Zero-Width Characters Are Great for Hidden Messages

Zero-width characters are a useful tool to have. However, it's essential never to forget their limitations. If you're using them as a covert means of communication, you should always remember to encrypt it. Otherwise, you're relying purely on the fact that no one will look for the hidden message.

And if you're using it as a means of flushing out leakers, it may not work if they're savvy and attempt to use screenshots or physically print out the documents. However, those methods also bring their own risks, and when employed all together, you still might be able to catch the culprit.

Thanks for reading! If you have any questions, you can ask here or on Twitter @The_Hoid.

Want to start making money as a white hat hacker? Jump-start your white-hat hacking career with our 2020 Premium Ethical Hacking Certification Training Bundle from the new Null Byte Shop and get over 60 hours of training from ethical hacking professionals.

Buy Now (90% off) >

Cover image by buntewelt/123RF (edited); Screenshots by Hoid/Null Byte

Our Best Hacking & Security Guides

New Null Byte posts — delivered straight to your inbox.

Be the First to Comment

Share Your Thoughts

  • Hot
  • Latest