Renaming Simple Functions

Simple Function

The above function is very simple. Let's ignore the actual code but think about the codes functionality from a generic standpoint. The code pushes arguments on to the stack, calls APIs, compares return values from the APIs and then returns one or zero. In most instance these simple functions do not need to be analyzed. By reading the API names most of the functionality can be inferred and easily renamed to something like "RegCreateAndSetValue".  After seeing these simple functions many times I realized that many of these functions could automatically be renamed. If broken down into steps it would look like this.
  1. API names from a function are extracted
  2. Sub-strings from the APIs are extracted
  3. Search for a common sub-string throughout all API names. 
  4. If a sub-string is common throughout all, create a name from the sub-strings. 
Step 1

    def get_apis(self, func_addr):
        flags = GetFunctionFlags(func_addr)
        # ignore library functions
        if flags & FUNC_LIB or flags & FUNC_THUNK:
            logging.debug("get_apis: Library code or thunk")
            return None
        # list of addresses
        dism_addr = list(FuncItems(func_addr))
        for instr in dism_addr:
            tmp_api_address = ""
            if idaapi.is_call_insn(instr):
                # In theory an API address should only have one xrefs
                # The xrefs approach was used because I could not find how to
                # get the API name by address.
                for xref in XrefsFrom(instr, idaapi.XREF_FAR):
                    if xref.to == None:
                        self.calls += 1
                        continue
                    tmp_api_address = xref.to
                    logging.debug("get_apis: xref to %x found", tmp_api_address)
                    break
                # get next instr since api address could not be found
                if tmp_api_address == "":
                    self.calls += 1
                    continue
                api_flags = GetFunctionFlags(tmp_api_address)
                # check for lib code (api)
                if api_flags & idaapi.FUNC_LIB == True or api_flags & idaapi.FUNC_THUNK:
                    tmp_api_name = NameEx(0, tmp_api_address)
                    if tmp_api_name:
                        self.apis.append(tmp_api_name)
                else:
                    self.calls += 1

Step 2 & 3

    def match_apis(self):
        self.matched = False
        api_set = set(self.apis)
        # Optional Threshold. Only check functions with more than 2 apis
        if self.calls <= self.threshold and len(self.apis) > 1:
            api_tokend  = []
            # for each api in function
            for api_name in api_set:
                # for each tokenized string in API name
                for item in self.tok.tokenizer(api_name):
                    if item is None or item is "A" or item is "W":
                        continue
                    api_tokend.append(item)
            # Count occurrence of strings.
            count_tmp = Counter(api_tokend)
            # if a common string is found in all APIs
            # return True and the count strings
            for string, count in count_tmp.items():
                if count == len(set(self.apis)):
                    self.matched = True
                    self.count_strings = count_tmp
                else:
                    logging.debug("match_apis: API count and API sub-string don't match")
        else:
            logging.debug("match_apis: calls above threshold or API count is 1")

A lot of the heavy lifting for parsing out the sub-strings is built into the tokenizer module in TT&SS. For more information and usage I'd recommend the following post

Step 4


    def create_string(self):
        if self.count_strings == "" or self.matched is False:
            return
        # Sort strings by highest occurrence
        sort = sorted(self.count_strings, key=self.count_strings.get, reverse=True)
        name = ""
        # if a function contains all the same API multiple times
        # might be possible to modify to deal with wrapper code also
        if self.calls == 0 and len(set(self.apis)) == 1 and len(self.apis) > 1:
            self.func_name = self.apis[0] + str(len(self.apis))
            return
        for each in sort:
            # ignore Wide or Ascii
            if each.upper() == "A" or each.upper() == "W":
                continue
            # Convert to CamelCase for easier reading and space
            tmp = each[0].upper() + each[1:]
            name += str(tmp)
        # replace white space with underscore
        name = name.replace(" ", "_")
        logging.debug("create_string: string created %s", name)
        self.func_name = name

If we were to apply that logic plus some other random stuff we would have the following..

I think this is pretty cool. I like the idea of combining other domains of knowledge such as Natural Language Text Processing to reversing. Sadly functions  simple functions or APIs that all contain a similar sub-string are rare. The rarity happens because a lot APIs that share similar functionality use generic APIs such as "CloseHandle" to close out a process. This API does not contain any of the sub-strings so it will fail the similarity test. I'm currently toying with an idea of using thresholds on matches or whitelisting certain APIs. Creating API sets as was used in the generic renaming of functions in IDAScope is another option. The main issue with that approach is categorizing of APIs by functionality. There are lot of little things for this project hence why I'm releasing it as a POC.  Below is the output of attempting to rename 456 functions in a Zeus IDB.


The VirtualProtect2 contains a "2" because the API was called twice from a function. The API names that end with an underscore and a value are for calculated names that happen multiple times.

The source POC is named w_sims.py and can be found in the POCS dir in the repo. The source also contains some code to identify wrapper functions. The code is currently setup to run the SimilarFunctions and the wrapper class on all the known functions. If you would like to run the wrapper class or experiment on other function tweak the execute options at the bottom of the code. The code is still being tweaked and fixed. I have been using this code off and on for a couple of weeks. I have seen some issues while importing the modules but  I think I got those ironed out. If anything breaks, you have feedback etc please leave a comment. 

No comments:

Post a Comment