Determine target ISA extensions of binary file in Linux (library or executable)
We have an issue related to a Java application running under a (rather old) FC3 on an Advantech POS board with a Via C3 processor. The java application has several compiled shared libs that are accessed via JNI.
Via C3 processor is supposed to be i686 compatible. Some time ago after installing Ubuntu 6.10 on a MiniItx board with the same processor, I found out that the previous statement is not 100% true. The Ubuntu kernel hanged on startup due to the lack of some specific and optional instructions of the i686 set in the C3 processor. These instructions missing in C3 implementation of i686 set are used by default by GCC compiler when using i686 optimizations. The solution, in this case, was to go with an i386 compiled version of Ubuntu distribution.
The base problem with the Java application is that the FC3 distribution was installed on the HD by cloning from an image of the HD of another PC, this time an Intel P4. Afterwards, the distribution needed some hacking to have it running such as replacing some packages (such as the kernel one) with the i386 compiled version.
The problem is that after working for a while the system completely hangs without a trace. I am afraid that some i686 code is left somewhere in the system and could be executed randomly at any time (for example after recovering from suspend mode or something like that).
My question is:
- Is there any tool or way to find out at what specific architecture extensions a binary file (executable or library) requires? file does not give enough information.
I think you need a tool that checks every instruction, to determine exactly which set it belongs to. Is there even an offical name for the specific set of instructions implemented by the C3 processor? If not, it's even hairier.
A quick'n'dirty variant might be to do a raw search in the file, if you can determine the bit pattern of the disallowed instructions. Just test for them directly, could be done by a simple objdump | grep chain, for instance.
The unix.linux file command is great for this. It can generally detect the target architecture and operating system for a given binary (and has been maintained on and off since 1973. wow!)
Of course, if you're not running under unix/linux - you're a bit stuck. I'm currently trying to find a java based port that I can call at runtime.. but no such luck.
The unix file command gives information like this:
hex: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.4.17, not stripped
More detailed information about the details of the architecture are hinted at with the (unix) objdump -f <fileName> command which returns:
architecture: arm, flags 0x00000112: EXEC_P, HAS_SYMS, D_PAGED start address 0x0000876c
This executable was compiled by a gcc cross compiler (compiled on an i86 machine for the ARM processor as a target)
I decide to add one more solution for any, who got here: personally in my case the information provided by the file and objdump wasn't enough, and the grep isn't much of a help -- I resolve my case through the readelf -a -W.
Note, that this gives you pretty much info. The arch related information resides in the very beginning and the very end. Here's an example:
ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: ARM Version: 0x1 Entry point address: 0x83f8 Start of program headers: 52 (bytes into file) Start of section headers: 2388 (bytes into file) Flags: 0x5000202, has entry point, Version5 EABI, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 8 Size of section headers: 40 (bytes) Number of section headers: 31 Section header string table index: 28 ... Displaying notes found at file offset 0x00000148 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 2.6.16 Attribute Section: aeabi File Attributes Tag_CPU_name: "7-A" Tag_CPU_arch: v7 Tag_CPU_arch_profile: Application Tag_ARM_ISA_use: Yes Tag_THUMB_ISA_use: Thumb-2 Tag_FP_arch: VFPv3 Tag_Advanced_SIMD_arch: NEONv1 Tag_ABI_PCS_wchar_t: 4 Tag_ABI_FP_rounding: Needed Tag_ABI_FP_denormal: Needed Tag_ABI_FP_exceptions: Needed Tag_ABI_FP_number_model: IEEE 754 Tag_ABI_align_needed: 8-byte Tag_ABI_align_preserved: 8-byte, except leaf SP Tag_ABI_enum_size: int Tag_ABI_HardFP_use: SP and DP Tag_CPU_unaligned_access: v6
To answer the ambiguity of whether a Via C3 is a i686 class processor: It's not, it's an i586 class processor.
Cyrix never produced a true 686 class processor, despite their marketing claims with the 6x86MX and MII parts. Among other missing instructions, two important ones they didn't have were CMPXCHG8b and CPUID, which were required to run Windows XP and beyond.
National Semiconductor, AMD and VIA have all produced CPU designs based on the Cyrix 5x86/6x86 core (NxP MediaGX, AMD Geode, VIA C3/C7, VIA Corefusion, etc.) which have resulted in oddball designs where you have a 586 class processor with SSE1/2/3 instruction sets.
My recommendation if you come across any of the CPUs listed above and it's not for a vintage computer project (ie. Windows 98SE and prior) then run screaming away from it. You'll be stuck on slow i386/486 Linux or have to recompile all of your software with Cyrix specific optimizations.
Expanding upon @Hi-Angel's answer I found an easy way to check the bit width of a static library:
readelf -a -W libsomefile.a | grep Class: | sort | uniq
Where libsomefile.a is my static library. Should work for other ELF files as well.
Quickest thing to find architecture would be to execute:
objdump -f testFile | grep architecture
This works even for binary.