Hooked on Mnemonics Worked for Me

Dyre IE Hooks

I recently wrapper up my analysis of Dyre. A PDF of document can be found in my papers repo. Most of the document focuses on the different stages that  Dyre interacts with the operating system. There are still some areas that I'd like to dig deeper into. For now it should be a good resource for anyone trying to identify a machine infected with Dyre or wanting to know more about the family of malware.

During the reversing process I found one part of Dyre functionality worthy of a post. As with most banking trojans Dyre contains functionality to hook APIs to log browser traffic. Typically to get the addresses of the APIs the sample will call GetProcAddress or manually traverse the portable executable file format to resolve symbols. If you are unfamiliar with the later technique I'd highly recommend reading section 3.3 of "Understanding Windows Shellcode" by Skape [1]. Dyre attempts to hook APIs in firefox.exe, chrome.exe and iexplorer.exe. It uses the standard GetProcAddress approach for resolving symbols in firefox.exe, is unsuccessful in chrome.exe and uses the GetProcAddress approach for the APIs LoadLibraryExW and CreateProcessInternalW in iexplorer.exe. Dyre hooks two APIs in WinInet.dll but it does it in a unique way. Dyre will read the image header timedatestamp [2] from WinInet. This value contains the time and date from when Wininet was created by the linker during compiling.  It will then compare the timedatestamp to a list of timedatestamps stored by Dyre.  The list contains presumably every time stamp for WinInet.dll since '2004-08-04 01:53:22' to '2014-07-25 04:04:59'.  Below is an example of the values that can be found in the list.

seg000:00A0C05F           db    0
seg000:00A0C060 TimeStampList dd 4110941Bh              ; DATA XREF: TimeStamp:_loopr
seg000:00A0C064 dword_A0C064 dd 0                       ; DATA XREF: TimeStamp+1Cr
seg000:00A0C064                                         ; TimeStamp:loc_A07A0Dr ...
seg000:00A0C068           dd 411095F2h <- Time stamp
seg000:00A0C06C           dd 0         <- WinInet index
seg000:00A0C070           dd 4110963Fh
seg000:00A0C074           dd 0
seg000:00A0C078           dd 4110967Dh
seg000:00A0C07C           dd 0
seg000:00A0C080           dd 411096D4h
seg000:00A0C084           dd 0
seg000:00A0C088           dd 411096DDh
seg000:00A0C08C           dd 0
seg000:00A0C090           dd 41252C1Bh
seg000:00A0C094           dd 0
.....
seg000:00A0C0AC           dd 1
seg000:00A0C0B0           dd 435862A0h
seg000:00A0C0B4           dd 2
seg000:00A0C0B8           dd 43C2A6A9h
seg000:00A0C0BC           dd 3
....
seg000:00A0D230           dd 4CE7BA3Fh
seg000:00A0D234           dd 78h
seg000:00A0D238           dd 53860FB3h
seg000:00A0D23C           dd 79h
seg000:00A0D240           dd 53D22BCBh
seg000:00A0D244           dd 7Ah

Values converted to time

>>> datetime.datetime.fromtimestamp(0x411095F2).strftime('%Y-%m-%d %H:%M:%S')
'2004-08-04 01:53:22'

>>> datetime.datetime.fromtimestamp(0x53D22BCB).strftime('%Y-%m-%d %H:%M:%S')
'2014-07-25 04:04:59'    

If the timedatestamp is not present or an error occurs Dyre will send the hash of WinInet to the attackers server. If the hash is not found it will send WinInet back to the attackers. Below are some of the strings responsible for displaying errors for the command and control.

'/%s/%s/63/file/%s/%s/%s/'
"Check wininet.dll on server failed"
"Send wininet.dll failed"

If the timedatestamp is found in the list the next value is used as an index into another list. For example if the timedatestamp was 4802A13Ah it would be found at the 49th entry and the next value would be 0x15 or 21.

Data
seg000:00A0C1E8           dd 4802A13Ah  <- '2008-04-13 18:11:38'
seg000:00A0C1EC           dd 15h  <- 21 index

Assembly to read index value

seg000:00A07A0D           movsx   edx, word ptr ds:TimeStampIndex[eax*8] ; edx = 21
seg000:00A07A15           lea     edx, [edx+edx*2] ; edx  = 63
seg000:00A07A18           mov     edx, ds:offset[edx*4]
seg000:00A07A1F           mov     [ecx], edx            ; save off value

Python: calculate offset
Python>hex(0x0A0D3E0 + (21+21* 2) * 4)
0xa0d4dc

Read
seg000:00A0D4DC           dw 0F3Ch  0x0f3C offset to inline hook in wininet

The value 0xF3C + the base address of WinInet is the function prologue for ICSecureSocket::Send_Fsm. Dyre uses this to know the address to place it's hooks.

ICSecureSocket::Send_Fsm(CFsm_SecureSend *)
    
77200F37    90              NOP
77200F38    90              NOP
77200F39    90              NOP
77200F3A    90              NOP
77200F3B    90              NOP
77200F3C  - E9 C7F0398A     JMP 015A0008   <- Inline hook
015A0008    68 4077A000     PUSH 0A07740
015A000D    C3              RETN

00A07740    55              PUSH EBP
00A07741    8BEC            MOV EBP,ESP
00A07743    83EC 08         SUB ESP,8
00A07746    894D FC         MOV DWORD PTR SS:[EBP-4],ECX
00A07749    68 2077A000     PUSH 0A07720
00A0774E    FF75 08         PUSH DWORD PTR SS:[EBP+8]
00A07751    FF75 FC         PUSH DWORD PTR SS:[EBP-4]
00A07754    FF15 94DEA000   CALL DWORD PTR DS:[A0DE94]
00A0775A    8945 F8         MOV DWORD PTR SS:[EBP-8],EAX

It will also hooks ICSecureSocket::Receive_Fsm in the same fashion.

Closing 
Rather than calling GetProcAddress (the hooked APIs are not exportable) Dyre stores the timedatestamp and patch offset of every known version of WinInet to avoid triggering heuristic based scanners. Seems like an arduous approach but still kind of cool. Another interesting fact is Dyre has the ability to patch Trusteer's RapportGP.dll if found in the browser memory. Dyre is actually a family of malware worthy of a deep dive. At first glance I ignored it because everything looked pretty cut & paste. I'd recommend others to check it out. If you find anything useful please shoot me an email. Cheers.

Hash Analyzed 099c36d73cad5f13ec1a89d5958486060977930b8e4d541e4a2f7d92e104cd21
  1. http://www.nologin.org/Downloads/Papers/win32-shellcode.pdf
  2. http://msdn.microsoft.com/en-us/library/ms680313.aspx

reg+displ

I have been reversing Dyre in my spare time. I'm hoping to have a full analysis out in the next week or two. Something kind of annoying about Dyre is it uses what looks like a massive structure to store it's data and function pointers. For example in the image below we can see it it passing a handle stored at [eax+0x130] to WaitForSingleObject.
Manually tracing the code or searching for all cross references is kind of painful to find what populated the value. Since the displacement is kind of unique due to it's value of 0x130 or 304 it can be targeted very easily in IDAPython.

import idautils 
import idaapi
displace = {}
for func in idautils.Functions():
    flags = GetFunctionFlags(func)
    if flags & FUNC_LIB or flags & FUNC_THUNK:
        continue  
    dism_addr = list(FuncItems(func))
    for curr_addr in dism_addr:
        op = None
        index = None 
        idaapi.decode_insn(curr_addr)
        if idaapi.cmd.Op1.type == idaapi.o_displ:
            op = 1
        if idaapi.cmd.Op2.type == idaapi.o_displ:
            op = 2
        if op == None:
            continue 
        if "bp" in idc.GetOpnd(curr_addr, 0):
            # ebp will return a negative number
            if op == 1:
                index = (~(int(idaapi.cmd.Op1.addr) - 1) & 0xFFFFFFFF)
            else:
                index = (~(int(idaapi.cmd.Op2.addr) - 1) & 0xFFFFFFFF)
        else:
            if op == 1:
                index = int(idaapi.cmd.Op1.addr)
            else:
                index = int(idaapi.cmd.Op2.addr)
        if index:
            if displace.has_key(index) == False:
                displace[index] = []
            displace[index].append(curr_addr)
                
The above code will create a dictionary of all the displacement values in known functions. A simple for loop can be used to find the address and disassembly of all uses for the defined displacement value.

Python>for x in displace[0x130]: print hex(x), GetDisasm(x)
0x10004f12 mov     [esi+130h], eax
0x10004f68 mov     [esi+130h], eax
0x10004fda push    dword ptr [esi+130h]  ; hObject
0x10005260 push    dword ptr [esi+130h]  ; hObject
0x10005293 push    dword ptr [eax+130h]  ; hHandle
0x100056be push    dword ptr [esi+130h]  ; hEvent
0x10005ac7 push    dword ptr [esi+130h]  ; hEvent
Python>
With the addresses it makes it easy to find where the value is populated.


The dictionary created by the script is named displace. It will contain all displaced values.  Not super 1337 but still useful. Cheers.

Backtrace POC - Stack Strings

Example 1 Hex View
There are a number of tools that cover char strings in IDA. If you are not familiar with char strings it's a low hanging obfuscation technique to thwart analyst from viewing the strings inside of an executable. Some notable tools and posts on this topic are [1] & [2]. In the image above you can see the string DBG. Odds are if we were viewing the executable in a hex editor or using strings this wouldn't stick out.

Example 1 Assembly View
If we were watching the stack of the executable at run time we would see something constructed similar to the string/comment above.
Example 2
 The code can be run in two modes the first is by selecting the code and the double clicking the script in IDA (ALT+F9). In the example above we can see the string "W32Time". My code attempts to reconstruct the stack memory. The buffer can be accessed via a list object.str_buff. In the Output window above you can see the content of the buffer dumped to standard out. This makes it easy to format the data and access it via an index. The commented data is an example of how the string would look on the stack in Ollydbg. The second way to execute the code is to pass an address within a function to object.run( address ). This will try to rebuild the stack for the whole function. All of this is done statically. Char strings that are populated via registers (such as mov [ebp+var_c], bl when bl is 0x4f in the example 1 image) are traced back using backtrace.py. For more details on backtrace please see the the following link.

As previously mentioned this topic has already been covered. I'm posting this code because it's a good example of using backtrace.py. I had fun working on this one. The code handles all examples I have found so far. There is an issue with formatting constructed wide char strings. Not exactly sure of the best approach. I tried to keep the data flexible so it should be easy to write a function to format the data.

[1]. Automatic Recovery of Constructed Strings in Malware by Jay Smith of FireEye - link
[2]. Finding Byte Strings using IDAPython by Jason Jones of Arbor Networks - link 

Repo - Link

Code for reviewing

"""
Author:
    Alexander Hanel 
Date:
    20140902
Version:
    1  - should be good to go.
Summary:
    Examples of using the backtrace library to rebuild strings

TODO:
    * How to deal with printing wide char strings?
    * What is the size of the frame buffer if GetFrameSize returns something
      smaller than the frame/stack index or the IDA does not recognize the function?

Notes:
    idaapi.o_phrase # Memory Ref [Base Reg + Index Reg]
    o_phrase   =  idaapi.o_phrase    #  Memory Ref [Base Reg + Index Reg]    phrase
    o_displ    =  idaapi.o_displ     #  Memory Reg [Base Reg + Index Reg + Displacement] phrase+addr

Useful Reads
    http://smokedchicken.org/2012/05/ida-rename-local-from-a-script.html
    http://zairon.wordpress.com/2008/02/15/idc-script-and-stack-frame-variables-length/
"""
import sys, os, logging, copy
from binascii import unhexlify
# Add the parent directory to Python Path
sys.path.append(os.path.realpath(__file__ + "/../../"))
# import the backtrace module
from backtrace import *

class Frame2Buff:
    def __init__(self):
        self.verbose = False
        self.func_start = idc.SelStart()
        # SelEnd() returns the following selected instruction
        self.func_end = SelEnd()
        self.esp = False
        self.ebp = False
        self.comment = True
        self.frame_size = None
        self.bt = None
        self.str_buff = None
        self.comment = True
        self.formatted_buff = ""
        self.format = True

    def run(self, func_addr=None):
        """ run and create Frame2Buff"""
        # check if code is selected or if using the whole function
        if self.func_start == BADADDR or self.func_end == BADADDR:
            if func_addr == None:
                if self.verbose:
                    print "ERROR: No addresses selected or passed"
                return None
        if func_addr:
            self.func_start = idc.GetFunctionAttr(func_addr, FUNCATTR_START)
            self.func_end = idc.GetFunctionAttr(func_addr, FUNCATTR_END)
        if self.func_start == BADADDR:
            if self.verbose:
                print "ERROR: Invalid address"
        self.frame_size = GetFrameSize(self.func_start)
        try:
            self.bt = Backtrace()
            self.bt.verbose = False
        except ImportError:
            print "ERROR: Could not import Backtrace - aborting"
        self.func_end = PrevHead(self.func_end)
        self.populate_buffer()
        if self.format:
            self.format_buff()
        if self.comment:
            self.comment_func()

    def populate_buffer(self):
        curr_addr = self.func_start
        self.str_buff = list('\x00' * self.frame_size)
        while curr_addr <= self.func_end:
            index = None
            idaapi.decode_insn(curr_addr)
            # check if instr is MOV, [esp|ebp + index], variable
            if idaapi.cmd.itype == idaapi.NN_mov and idaapi.cmd.Op1.type == idaapi.o_displ:
                if "bp" in idc.GetOpnd(curr_addr, 0):
                    # ebp will return a negative number
                    index = (~(int(idaapi.cmd.Op1.addr) - 1) & 0xFFFFFFFF)
                    self.ebp = True
                else:
                    index = int(idaapi.cmd.Op1.addr)
                    self.esp = True
                if idaapi.cmd.Op2.type == idaapi.o_reg:
                    # value needs to be traced back
                    self.bt.backtrace(curr_addr, 1)
                    # tainted means the reg was xor reg, reg
                    # odds are being used to init var.
                    if self.bt.tainted != True:
                        last_ref = self.bt.refsLog[-1]
                        idaapi.decode_insn(int(last_ref[0]))
                        data = idaapi.cmd.Op2.value
                    else:
                        # tracked variable has been set to zero by xor reg, reg
                        curr_addr = idc.NextHead(curr_addr)
                        continue
                elif idaapi.cmd.Op2.type != idaapi.o_imm:
                    curr_addr = idc.NextHead(curr_addr)
                    continue
                else:
                    data = idaapi.cmd.Op2.value
                if data:
                    try:
                        hex_values = hex(data)[2:]
                        if hex_values[-1] == "L":
                            hex_values = hex_values[:-1]
                        if len(hex_values) % 2:
                            hex_values = "0" + hex_values
                        temp = unhexlify(hex_values)
                    except:
                        if self.verbose:
                            print "ERROR: Unhexlify Issue at %x %s (not added)" % (curr_addr, idc.GetDisasm(curr_addr))
                        curr_addr = idc.NextHead(curr_addr)
                        continue
                else:
                    curr_addr = idc.NextHead(curr_addr)
                    continue
                # GetFrameSize is not a reliable buffer size
                # If so append to buffer if index is less than
                # 2 * frame size. If more likely an error
                if self.ebp or self.esp:
                    cal_index = index + len(temp)
                    if cal_index > self.frame_size:
                        if cal_index < (self.frame_size * 2):
                            for a in range(cal_index - self.frame_size):
                                self.str_buff.append("\x00")
                                if self.verbose:
                                    print "ERROR: Frame size incorrect, appending"
                if self.ebp:
                    # reverse the buffer
                    temp = temp[::-1]
                    for c, ch in enumerate(temp):
                        try:
                            self.str_buff[index - c] = ch
                        except:
                            if self.verbose:
                                print "ERROR: Frame EBP index invalid: at %x" % (curr_addr)
                if self.esp:
                    for c, ch in enumerate(temp):
                        try:
                            self.str_buff[index + c] = ch
                        except:
                                print "ERROR: Frame ESP index invalid: at %x" % (curr_addr)
            curr_addr = idc.NextHead(curr_addr)
        # reverse the buffer to match index
        if self.ebp == True:
            self.str_buff = self.str_buff[::-1]
            self.str_buff.pop()



    def format_buff(self):
        self.formatted_buff = ""
        temp_buff = copy.copy(self.str_buff)

        if self.ebp == True:
            temp_buff = temp_buff[::-1]
            temp_buff.pop()

        if self.str_buff:
            for index, ch in enumerate(temp_buff):
                try:
                    if ch == "\x00" and temp_buff[index + 1] != "\x00":
                        self.formatted_buff += " "
                except:
                    pass
                if ch != "\x00":
                    self.formatted_buff += ch

    def comment_func(self):
        idc.MakeComm(self.func_end, self.formatted_buff)

"""
Example:
    Create a buffer of the whole function

x = Frame2Buff()
x.run(here())  # func adddr

"""
x = Frame2Buff()
x.run() # select data

Renaming Simple Functions

Simple Function

The above function is very simple. Let's ignore the actual code but think about the codes functionality from a generic standpoint. The code pushes arguments on to the stack, calls APIs, compares return values from the APIs and then returns one or zero. In most instance these simple functions do not need to be analyzed. By reading the API names most of the functionality can be inferred and easily renamed to something like "RegCreateAndSetValue".  After seeing these simple functions many times I realized that many of these functions could automatically be renamed. If broken down into steps it would look like this.
  1. API names from a function are extracted
  2. Sub-strings from the APIs are extracted
  3. Search for a common sub-string throughout all API names. 
  4. If a sub-string is common throughout all, create a name from the sub-strings. 
Step 1

    def get_apis(self, func_addr):
        flags = GetFunctionFlags(func_addr)
        # ignore library functions
        if flags & FUNC_LIB or flags & FUNC_THUNK:
            logging.debug("get_apis: Library code or thunk")
            return None
        # list of addresses
        dism_addr = list(FuncItems(func_addr))
        for instr in dism_addr:
            tmp_api_address = ""
            if idaapi.is_call_insn(instr):
                # In theory an API address should only have one xrefs
                # The xrefs approach was used because I could not find how to
                # get the API name by address.
                for xref in XrefsFrom(instr, idaapi.XREF_FAR):
                    if xref.to == None:
                        self.calls += 1
                        continue
                    tmp_api_address = xref.to
                    logging.debug("get_apis: xref to %x found", tmp_api_address)
                    break
                # get next instr since api address could not be found
                if tmp_api_address == "":
                    self.calls += 1
                    continue
                api_flags = GetFunctionFlags(tmp_api_address)
                # check for lib code (api)
                if api_flags & idaapi.FUNC_LIB == True or api_flags & idaapi.FUNC_THUNK:
                    tmp_api_name = NameEx(0, tmp_api_address)
                    if tmp_api_name:
                        self.apis.append(tmp_api_name)
                else:
                    self.calls += 1

Step 2 & 3

    def match_apis(self):
        self.matched = False
        api_set = set(self.apis)
        # Optional Threshold. Only check functions with more than 2 apis
        if self.calls <= self.threshold and len(self.apis) > 1:
            api_tokend  = []
            # for each api in function
            for api_name in api_set:
                # for each tokenized string in API name
                for item in self.tok.tokenizer(api_name):
                    if item is None or item is "A" or item is "W":
                        continue
                    api_tokend.append(item)
            # Count occurrence of strings.
            count_tmp = Counter(api_tokend)
            # if a common string is found in all APIs
            # return True and the count strings
            for string, count in count_tmp.items():
                if count == len(set(self.apis)):
                    self.matched = True
                    self.count_strings = count_tmp
                else:
                    logging.debug("match_apis: API count and API sub-string don't match")
        else:
            logging.debug("match_apis: calls above threshold or API count is 1")

A lot of the heavy lifting for parsing out the sub-strings is built into the tokenizer module in TT&SS. For more information and usage I'd recommend the following post

Step 4


    def create_string(self):
        if self.count_strings == "" or self.matched is False:
            return
        # Sort strings by highest occurrence
        sort = sorted(self.count_strings, key=self.count_strings.get, reverse=True)
        name = ""
        # if a function contains all the same API multiple times
        # might be possible to modify to deal with wrapper code also
        if self.calls == 0 and len(set(self.apis)) == 1 and len(self.apis) > 1:
            self.func_name = self.apis[0] + str(len(self.apis))
            return
        for each in sort:
            # ignore Wide or Ascii
            if each.upper() == "A" or each.upper() == "W":
                continue
            # Convert to CamelCase for easier reading and space
            tmp = each[0].upper() + each[1:]
            name += str(tmp)
        # replace white space with underscore
        name = name.replace(" ", "_")
        logging.debug("create_string: string created %s", name)
        self.func_name = name

If we were to apply that logic plus some other random stuff we would have the following..

I think this is pretty cool. I like the idea of combining other domains of knowledge such as Natural Language Text Processing to reversing. Sadly functions  simple functions or APIs that all contain a similar sub-string are rare. The rarity happens because a lot APIs that share similar functionality use generic APIs such as "CloseHandle" to close out a process. This API does not contain any of the sub-strings so it will fail the similarity test. I'm currently toying with an idea of using thresholds on matches or whitelisting certain APIs. Creating API sets as was used in the generic renaming of functions in IDAScope is another option. The main issue with that approach is categorizing of APIs by functionality. There are lot of little things for this project hence why I'm releasing it as a POC.  Below is the output of attempting to rename 456 functions in a Zeus IDB.


The VirtualProtect2 contains a "2" because the API was called twice from a function. The API names that end with an underscore and a value are for calculated names that happen multiple times.

The source POC is named w_sims.py and can be found in the POCS dir in the repo. The source also contains some code to identify wrapper functions. The code is currently setup to run the SimilarFunctions and the wrapper class on all the known functions. If you would like to run the wrapper class or experiment on other function tweak the execute options at the bottom of the code. The code is still being tweaked and fixed. I have been using this code off and on for a couple of weeks. I have seen some issues while importing the modules but  I think I got those ironed out. If anything breaks, you have feedback etc please leave a comment. 

Random Applocker Thoughts

While reading through the Windows Internals book I came across an interesting feature called AppLocker.
AppLocker provides a robust experience for IT administrators through new rule creation tools and wizards. For example, IT administrators can automatically generate rules by using a test reference computer and then importing the rules into a production environment for widespread deployment. The IT administrator can also export a policy to provide a backup of the production configuration or to provide documentation for compliance purposes. Your Group Policy infrastructure can be used to build and deploy AppLocker rules as well, saving your organization training and support costs.
Source
The paragraph above is a good description of it's functionality. The tool focuses on policy based security. For example prevent kazaa.exe from running in our enterprise environment. Applocker has three ways of blocking files from executing. The three are publisher, path/file and hash. The publisher is by far the most interesting because it is a little more generic but still targets a decent characteristic. For example if Company A is targeted by Malware B that is always signed with a stolen or legitimate certificate XXX. Company A could create a policy or rule to target the publisher, product name, file name or file version of that signed file. If that rule is ever triggered the file could be blocker or logged as an event and then feed into a SIEM.  The second useful example is in the event of a mass spam campaign. If an organization received 15k emails with a zip attachment. It's almost guaranteed 1% of that population will execute the file within the zip. This espically true if inbox cleanup could take an hour or two.  Most email spam campaign attachments contain a static hash. If an organization had a user report the spam campaign an analyst could create a Applocker rule based on the file hash and push it out as a new policy rule. Odds are the turn around time on pushing out an Applocker policy rule would be faster than getting an AV signature update.

Dear Microsoft Employees,
In future releases of Applocker could you please have an option to use the parent process as a filter for rules?  For example in Haifei Li slide 21 in Exploring in the Wild: A Big Data Approach to Application Security Research (and Exploit Detection) [link], he mentions that out of half million Microsoft Documents the only time they saw MSCOMCTL.OCX loaded was during exploitation. If a rule could be created by an Applocker user to alert when MSCOMCTL.OCX was loaded by a Microsoft Application it could give early alerting on possible exploitation. The same concept could also be applied to Adobe Reader, Java and other commonly exploited third party applications.

Disclaimer
I'm currently not working in an enterprise environment so I can't test these thoughts. Applocker is only available on premium versions of Windows 7 and up.

Kind of a cool feature. Here is a video overview. Anybody have any success stories using AppLocker?

garts.py


'''
Name:
    garts.py (Get all referenced text strings)
Version:
    1.0 
Date:
    2014/05/21
Author:
    alexander<dot>hanel<at>gmail<dot>com
Description:
    garts.py is a simple string viewer for IDA. It will iterate through
    all known functions, look for possible string refercences and then
    print them. This is super helpful for dealing with strings in Delphi
    executables, finding missed strings, having the exact location of
    where a string is being used or where data is possibly going to be
    decrypted.  

    Example Output 
    Address      String
    0x1a701045                  <- new line char. Not the best starting example..         
    0x1a7010bd   #command
    0x1a701199   SOFTWARE\Microsoft\Windows\CurrentVersion\Run
    0x1a7011be   govShell

    Xref Example
    .text:1A7010BD                 push    offset aCommand ; "#command"
    .text:1A7010C2                 lea     eax, [ebp+var_110]
    ....
    .text:1A701199                 push    offset SubKey   ; "SOFTWARE\\Microsoft\\Windows\\CurrentVersi"...
    .text:1A70119E                 push    80000001h       ; hKey
    .text:1A7011A3                 call    ds:RegOpenKeyA

    The script also calls the helpful idautils.strings and then adds all
    the found strings to the viewer window.

    Any ideas, comments, bugs, etc please send me an email. Cheers. 
'''

import idautils
class Viewer(idaapi.simplecustviewer_t):
    # modified version of http://dvlabs.tippingpoint.com/blog/2011/05/11/mindshare-extending-ida-custviews
    def __init__(self, data):
        self.fourccs = data
        self.Create()
        self.Show()

    def Create(self):
        title = "A Better String Viewer"
        idaapi.simplecustviewer_t.Create(self, title)
        c = "%s %11s" % ("Address", "String")
        comment = idaapi.COLSTR(c, idaapi.SCOLOR_BINPREF)
        self.AddLine(comment)
        for item in self.fourccs:
            addy = item[0]
            string_d = item[1]
            address_element = idaapi.COLSTR("0x%08x " % addy, idaapi.SCOLOR_REG)
            str_element = idaapi.COLSTR("%-1s" % string_d, idaapi.SCOLOR_REG)
            line = address_element + "  " +  str_element
            self.AddLine(line)
        return True

    def OnDblClick(self, something):
        value = self.GetCurrentWord()
        if value[:2] == '0x':
            Jump(int(value, 16))
        return True

    def OnHint(self, lineno):
        if lineno < 2: return False
        else: lineno -= 2
        line = self.GetCurrentWord()
        if line == None: return False
        if "0x" not in line: return False
        # skip COLSTR formatting, find address
        addy = int(line, 16)
        disasm = idaapi.COLSTR(GetDisasm(addy) + "\n", idaapi.SCOLOR_DREF)
        return (1, disasm)


def enumerate_strings():
    display = []
    # interate through all functions 
    for func in idautils.Functions():
        flags = GetFunctionFlags(func)
        # ignore library code 
        if flags & FUNC_LIB or flags & FUNC_THUNK:
            continue
        # get a list of the addresses in the function. Using a range of < or >
        # if flawed when the code is obfuscated. 
        dism_addr = list(FuncItems(func))
        # for each instruction in the function 
        for line in dism_addr:
            temp = None
            val_addr = 0
            if GetOpType(line,0) == 5:
                val_addr = GetOperandValue(line,0)
                temp = GetString(val_addr, -1)
            elif GetOpType(line,1) == 5:
                val_addr = GetOperandValue(line,1)
                temp = GetString(val_addr, -1)
            if temp:
                # in testing isCode() failed to accurately detect if address was code
                # decided to try something a little more generic 
                if val_addr not in dism_addr and GetFunctionName(val_addr) == '':
                    if GetStringType(val_addr) == 3:
                        temp = GetString(val_addr, -1, ASCSTR_UNICODE)
                    display.append((line, temp))

    # Get the strings already found
    # https://www.hex-rays.com/products/ida/support/idapython_docs/idautils.Strings-class.html
    s = idautils.Strings(False)
    s.setup(strtypes=Strings.STR_UNICODE | Strings.STR_C)
    for i, v in enumerate(s):
        if v is None:
            pass
        else:
            display.append((v.ea, str(v)))

    sorted_display = sorted(display, key=lambda tup:tup[0])
    return sorted_display


if __name__ == "__main__":
    ok = enumerate_strings()
    Viewer(ok)

Link to Repo

ex_pe_xor.py

For anyone else who doesn't want to manually carve out single byte XOR encoded executables.

C:\Documents and Settings\Administrator\Desktop\x>ex_pe_xor.py bad.bin
 * Encoded PE Found, Key 0x21, Offset 0x0
 * exe found at offset 0x0

C:\Documents and Settings\Administrator\Desktop\x>dir

04/30/2014  08:36 PM    <DIR>          .
04/30/2014  08:36 PM    <DIR>          ..
04/30/2014  08:36 PM            24,576 1.exe   <- carved
04/30/2014  05:44 PM            24,576 bad.bin
04/30/2014  08:06 PM             3,526 ex_pe_xor.py

Pefile must be installed.

## detects single byte xor encoding by searching for the 
## encoded MZ, lfanew and PE, then XORs the data and 
## uses pefile to extract the decoded executable. 
## written quickly/poorly by alexander hanel 

import sys
import struct
import pefile
import re
from StringIO import StringIO 

def get_xor():
    # read file into a bytearray
    byte = bytearray(open(sys.argv[1], 'rb').read())

    # for each byte in the file stream, excluding the last 256 bytes
    for i in range(0, len(byte) - 256):
            # KEY ^ VALUE ^ KEY = VALUE; Simple way to get the key 
            key = byte[i] ^ ord('M')
            # verify the two bytes contain 'M' & 'Z'
            if chr(byte[i] ^ key) == 'M' and  chr(byte[i+1] ^ key) == 'Z':
                    # skip non-XOR encoded MZ
                    if key == 0:
                            continue
                    # read four bytes into temp, offset to PE aka lfanew
                    temp = byte[(i + 0x3c) : (i + 0x3c + 4)]
                    # decode values with key 
                    lfanew = []
                    for x in temp:
                            lfanew.append( x ^ key)
                    # convert from bytearray to int value, probably a better way to do this
                    pe_offset  = struct.unpack( '<i', str(bytearray(lfanew)))[0]
                     # verify results are not negative or read is bigger than file 
                    if pe_offset < 0 or pe_offset > len(byte):
                            continue
                    # verify the two decoded bytes are 'P' & 'E'
                    if byte[pe_offset + i ] ^ key == ord('P') and byte[pe_offset + 1 + i] ^ key == ord('E'):
                            print " * Encoded PE Found, Key 0x%x, Offset 0x%x" % (key, i)
                            return (key, i)
    return (None, None)

def getExt(pe):
        if pe.is_dll() == True:
            return 'dll'
        if pe.is_driver() == True:
            return 'sys'
        if pe.is_exe() == True:
            return 'exe'
        else:
            return 'bin'
            
def writeFile(count, ext, pe):
        try:
            out  = open(str(count)+ '.' + ext, 'wb')
        except:
            print '\t[FILE ERROR] could not write file'
            sys.exit()
        # remove overlay or junk in the trunk
        out.write(pe.trim())
        out.close()
                        
def xor_data(key, offset):
        byte = bytearray(open(sys.argv[1], 'rb').read())
        temp = ''
        for x in byte:
            temp += chr(x ^ key)
        return temp
        
def carve(fileH):
        if type(fileH) is str:
            fileH = StringIO(fileH)
        c = 1
        # For each address that contains MZ
        for y in [tmp.start() for tmp in re.finditer('\x4d\x5a', fileH.read())]:
            fileH.seek(y)
            try:
                pe = pefile.PE(data=fileH.read())
            except:
                continue 
            # determine file ext
            ext = getExt(pe)
            print ' *', ext , 'found at offset', hex(y) 
            writeFile(c,ext,pe)
            c += 1
            ext = ''
            fileH.seek(0)
            pe.close

def run():
    if len(sys.argv) < 2:
        print "Usage: ex_pe_xor.py <xored_data>"
        return 
    key, offset = get_xor()
    if key == None:
        return
    data = xor_data(key, offset)
    carve(data) 
    
run()