Remove eager symbol table parsing from File::open_stream

Now, all section parsing is done lazily on-demand by the other File methods

Also, update README.md and lib.rs doc comment to reflect current library development state
This commit is contained in:
Christopher Cole 2022-10-20 22:45:27 -07:00
parent 4e706431e3
commit e9bc799ec7
No known key found for this signature in database
GPG Key ID: 0AC856975983E9DB
3 changed files with 91 additions and 116 deletions

106
README.md
View File

@ -9,78 +9,76 @@ The `elf` crate provides a pure-rust interface for reading ELF object files.
[Documentation](https://docs.rs/elf/)
# `elf`
The `elf` crate provides an interface for reading ELF object files.
# Capabilities
**Contains no unsafe code**: Many of the other rust ELF parsers out there
contain bits of unsafe code deep down or in dependencies to
reinterpret/transmute byte contents as structures in order to drive
zero-copy parsing. They're slick, and that also introduces unsafe code
blocks (albeit small ones). This crate strives to serve as an alternate
implementation with zero unsafe code blocks.
### Endian-aware:
This crate properly handles translating between file and host endianness
when parsing the ELF contents.
**Endian-aware**: This crate properly handles translating between file and
host endianness when parsing the ELF contents.
**Lazy parsing**: This crate strives for lazy evaluation and parsing when
possible. For example, the `SymbolTable` simply acts as an interpretation layer
on top of a `&[u8]`. Parsing of `Symbol`s takes place only when symbols are
### Lazy parsing:
This crate strives for lazy evaluation and parsing when possible.
[File::open_stream()][File::open_stream] reads, parses and validates the ELF
File Header, then stops there. All other i/o and parsing is deferred to
being performed on-demand by other methods on [File]. For example,
[File::symbol_table()](File::symbol_table) reads the data for the symbol
table and associated string table then returns them with types like
[SymbolTable](symbol::SymbolTable) and
[StringTable](string_table::StringTable) which simply act as an
interpretation layer on top of `&[u8]`s, where parsing of
[Symbol](symbol::Symbol)s and strings take place only when they are
requested.
**Tiny compiled library size**: At the time of writing this, the release crate
was only ~30kB!
### Lazy i/o:
This crate provides two ways of parsing ELF files:
* From a `&[u8]` into which the user has already read the full contents of the file
* From a Read + Seek (such as a [std::file::File]) where file contents are read
lazily on-demand based on what the user wants to inspect.
These allow you to decide what tradeoff you want to make. If you're going to be working
with the whole file at once, then the byte slice approach is probably worthwhile to minimize
i/o overhead by streaming the whole file into memory at once. If you're only going to
be inspecting part of the file, then the Read + Seek approach would help avoid the
overhead of reading a bunch of unused file data just to parse out a few things.
### No unsafe code:
Many of the other rust ELF parsers out there contain bits of unsafe code
deep down or in dependencies to reinterpret/transmute byte contents as
structures in order to drive zero-copy parsing. They're slick, and there's
typically appropriate checking to validate the assumptions to make that
unsafe code work, but nevertheless it introduces unsafe code blocks (albeit
small ones). This crate strives to serve as an alternate implementation with
zero unsafe code blocks.
# Future plans
**Add no_std option**: Currently, the main impediment to a no_std option is the
use of allocating datastructures, such as the parsed section contents' `Vec<u8>`.
**Lazily loading section contents**: Currently, all of the section data is read
from the input stream into allocated `Vec<u8>` when the stream is opened. This can
be unnecessarily expensive for use-cases that don't need to inspect all the section
contents.
A potential future vision for both of these issues is to rework the parsing
code's reader trait implementations to provide two options:
* A wrapper around a `&[u8]` which already contains the full ELF contents. This
could be used for a no_std option where we want to simply parse out the ELF
structures from the existing data without needing to heap-allocate buffers
in which to store the reads.
* An allocating CachedReader type which wraps a stream which can allocate
and remember `Vec<u8>` buffers in which to land the data from file reads.
The former no_std option is useful when you need `no_std`, however it forces
the user to invoke the performance penalty of reading the entire file
contents up front.
The latter option is useful for use-cases that only want to interpret parts
of an ELF object, where allocating buffers to store the reads is a much
smaller cost than reading in the whole large object contents.
**Add no_std option** This would disable the Read + Seek interface and limit
the library to the `&[u8]` parsing impl.
## Example:
```rust
extern crate elf;
fn main() {
let path: std::path::PathBuf = From::from("some_file");
let mut io = match std::fs::File::open(path) {
Ok(f) => f,
Err(e) => panic!("Error: {:?}", e),
};
let path = std::path::PathBuf::from("some_file");
let elf_file = match elf::File::open_stream(&mut io) {
Ok(f) => f,
Err(e) => panic!("Error: {:?}", e),
};
let file_data = std::fs::read(path).expect("Could not read file.").as_slice();
println!("ELF: {}", elf_file.ehdr);
let mut file = File::open_stream(file_data).expect("Could not parse ELF Header");
let text_scn = match elf_file.sections.get_by_name(".text") {
Some(s) => s,
None => panic!("Failed to find .text section"),
};
let (symtab, strtab) = file
.symbol_table()
.expect("Failed to read symbol table")
.expect("File contained no symbol table");
let symbol = symtab.get(30).expect("Failed to get symbol");
let symbol_name = strtab
.get(symbol.st_name as usize)
.expect("Failed to get name from strtab");
println!("{:?}", text_scn.data);
println!("{symbol_name}: {symbol}");
}
```

View File

@ -8,19 +8,13 @@ use crate::{gabi, string_table};
pub struct File<R: ReadBytesAt> {
reader: R,
pub ehdr: FileHeader,
sections: section::SectionTable,
}
impl<R: ReadBytesAt> File<R> {
pub fn open_stream(mut reader: R) -> Result<File<R>, ParseError> {
let ehdr = FileHeader::parse(&mut reader)?;
let table = section::SectionTable::parse(&ehdr, &mut reader)?;
Ok(File {
reader,
ehdr,
sections: table,
})
Ok(File { reader, ehdr })
}
/// Get an iterator over the Segments (ELF Program Headers) in the file
@ -143,10 +137,6 @@ impl<R: ReadBytesAt> File<R> {
self.section_data_as_strtab(&strtab_shdr)
}
pub fn sections(&self) -> Result<&section::SectionTable, ParseError> {
Ok(&self.sections)
}
fn get_symbol_table_of_type(
&mut self,
symtab_type: section::SectionType,
@ -680,12 +670,7 @@ mod interface_tests {
let io = std::fs::File::open(path).expect("Could not open file.");
let mut c_io = CachedReadBytes::new(io);
let file = File::open_stream(&mut c_io).expect("Open test1");
let bss = file
.sections()
.expect("Failed to get section table")
.get_by_name(".bss")
.expect("Could not find .bss section");
assert!(bss.data.iter().all(|&b| b == 0));
assert_eq!(file.ehdr.elftype, ObjectFileType(gabi::ET_EXEC));
}
#[test]
@ -694,12 +679,7 @@ mod interface_tests {
let file_data = std::fs::read(path).expect("Could not read file.");
let slice = file_data.as_slice();
let file = File::open_stream(slice).expect("Open test1");
let bss = file
.sections()
.expect("Failed to get section table")
.get_by_name(".bss")
.expect("Could not find .bss section");
assert!(bss.data.iter().all(|&b| b == 0));
assert_eq!(file.ehdr.elftype, ObjectFileType(gabi::ET_EXEC));
}
#[test]

View File

@ -4,51 +4,48 @@
//!
//! # Capabilities
//!
//! **Contains no unsafe code**: Many of the other rust ELF parsers out there
//! contain bits of unsafe code deep down or in dependencies to
//! reinterpret/transmute byte contents as structures in order to drive
//! zero-copy parsing. They're slick, and that also introduces unsafe code
//! blocks (albeit small ones). This crate strives to serve as an alternate
//! implementation with zero unsafe code blocks.
//! ### Endian-aware:
//! This crate properly handles translating between file and host endianness
//! when parsing the ELF contents.
//!
//! **Endian-aware**: This crate properly handles translating between file and
//! host endianness when parsing the ELF contents.
//! ### Lazy parsing:
//! This crate strives for lazy evaluation and parsing when possible.
//! [File::open_stream()][File::open_stream] reads, parses and validates the ELF
//! File Header, then stops there. All other i/o and parsing is deferred to
//! being performed on-demand by other methods on [File]. For example,
//! [File::symbol_table()](File::symbol_table) reads the data for the symbol
//! table and associated string table then returns them with types like
//! [SymbolTable](symbol::SymbolTable) and
//! [StringTable](string_table::StringTable) which simply act as an
//! interpretation layer on top of `&[u8]`s, where parsing of
//! [Symbol](symbol::Symbol)s and strings take place only when they are
//! requested.
//!
//! **Lazy parsing**: This crate strives for lazy evaluation and parsing when
//! possible. For example, the [SymbolTable](symbol::SymbolTable) simply
//! acts as an interpretation layer on top of a `&[u8]`. Parsing of
//! [Symbol](symbol::Symbol)s takes place only when symbols are requested.
//! ### Lazy i/o:
//! This crate provides two ways of parsing ELF files:
//! * From a `&[u8]` into which the user has already read the full contents of the file
//! * From a Read + Seek (such as a `std::file::File`) where file contents are read
//! lazily on-demand based on what the user wants to inspect.
//!
//! **Tiny compiled library size**: At the time of writing this, the release lib
//! was only ~30kB!
//! These allow you to decide what tradeoff you want to make. If you're going to be working
//! with the whole file at once, then the byte slice approach is probably worthwhile to minimize
//! i/o overhead by streaming the whole file into memory at once. If you're only going to
//! be inspecting part of the file, then the Read + Seek approach would help avoid the
//! overhead of reading a bunch of unused file data just to parse out a few things.
//!
//! ### No unsafe code:
//! Many of the other rust ELF parsers out there contain bits of unsafe code
//! deep down or in dependencies to reinterpret/transmute byte contents as
//! structures in order to drive zero-copy parsing. They're slick, and there's
//! typically appropriate checking to validate the assumptions to make that
//! unsafe code work, but nevertheless it introduces unsafe code blocks (albeit
//! small ones). This crate strives to serve as an alternate implementation with
//! zero unsafe code blocks.
//!
//! # Future plans
//!
//! **Add no_std option**: Currently, the main impediment to a no_std option is the
//! use of allocating datastructures, such as the parsed section contents' Vec<u8>.
//!
//! **Lazily loading section contents**: Currently, all of the section data is read
//! from the input stream into allocated Vec<u8> when the stream is opened. This can
//! be unnecessarily expensive for use-cases that don't need to inspect all the section
//! contents.
//!
//! A potential future vision for both of these issues is to rework the parsing
//! code's reader trait implementations to provide two options:
//!
//! * A wrapper around a `&[u8]` which already contains the full ELF contents. This
//! could be used for a no_std option where we want to simply parse out the ELF
//! structures from the existing data without needing to heap-allocate buffers
//! in which to store the reads.
//! * An allocating CachedReader type which wraps a stream which can allocate
//! and remember Vec<u8> buffers in which to land the data from file reads.
//!
//! The former no_std option is useful when you need no_std, however it forces
//! the user to invoke the performance penalty of reading the entire file
//! contents up front.
//!
//! The latter option is useful for use-cases that only want to interpret parts
//! of an ELF object, where allocating buffers to store the reads is a much
//! smaller cost than reading in the whole large object contents.
//! **Add no_std option** This would disable the Read + Seek interface and limit
//! the library to the `&[u8]` parsing impl.
//!
pub mod file;