The above function is very simple. Let's ignore the actual code but think about the codes functionality from a generic standpoint. The code pushes arguments on to the stack, calls APIs, compares return values from the APIs and then returns one or zero. In most instance these simple functions do not need to be analyzed. By reading the API names most of the functionality can be inferred and easily renamed to something like "RegCreateAndSetValue". After seeing these simple functions many times I realized that many of these functions could automatically be renamed. If broken down into steps it would look like this.
- API names from a function are extracted
- Sub-strings from the APIs are extracted
- Search for a common sub-string throughout all API names.
- If a sub-string is common throughout all, create a name from the sub-strings.
Step 2 & 3
A lot of the heavy lifting for parsing out the sub-strings is built into the tokenizer module in TT&SS. For more information and usage I'd recommend the following post.
If we were to apply that logic plus some other random stuff we would have the following..
I think this is pretty cool. I like the idea of combining other domains of knowledge such as Natural Language Text Processing to reversing. Sadly functions simple functions or APIs that all contain a similar sub-string are rare. The rarity happens because a lot APIs that share similar functionality use generic APIs such as "CloseHandle" to close out a process. This API does not contain any of the sub-strings so it will fail the similarity test. I'm currently toying with an idea of using thresholds on matches or whitelisting certain APIs. Creating API sets as was used in the generic renaming of functions in IDAScope is another option. The main issue with that approach is categorizing of APIs by functionality. There are lot of little things for this project hence why I'm releasing it as a POC. Below is the output of attempting to rename 456 functions in a Zeus IDB.
The VirtualProtect2 contains a "2" because the API was called twice from a function. The API names that end with an underscore and a value are for calculated names that happen multiple times.
The source POC is named w_sims.py and can be found in the POCS dir in the repo. The source also contains some code to identify wrapper functions. The code is currently setup to run the SimilarFunctions and the wrapper class on all the known functions. If you would like to run the wrapper class or experiment on other function tweak the execute options at the bottom of the code. The code is still being tweaked and fixed. I have been using this code off and on for a couple of weeks. I have seen some issues while importing the modules but I think I got those ironed out. If anything breaks, you have feedback etc please leave a comment.