Remove eager symbol table parsing from File::open_stream
Now, all section parsing is done lazily on-demand by the other File methods Also, update README.md and lib.rs doc comment to reflect current library development state
This commit is contained in:
parent
4e706431e3
commit
e9bc799ec7
106
README.md
106
README.md
@ -9,78 +9,76 @@ The `elf` crate provides a pure-rust interface for reading ELF object files.
|
||||
|
||||
[Documentation](https://docs.rs/elf/)
|
||||
|
||||
# `elf`
|
||||
|
||||
The `elf` crate provides an interface for reading ELF object files.
|
||||
|
||||
# Capabilities
|
||||
|
||||
**Contains no unsafe code**: Many of the other rust ELF parsers out there
|
||||
contain bits of unsafe code deep down or in dependencies to
|
||||
reinterpret/transmute byte contents as structures in order to drive
|
||||
zero-copy parsing. They're slick, and that also introduces unsafe code
|
||||
blocks (albeit small ones). This crate strives to serve as an alternate
|
||||
implementation with zero unsafe code blocks.
|
||||
### Endian-aware:
|
||||
This crate properly handles translating between file and host endianness
|
||||
when parsing the ELF contents.
|
||||
|
||||
**Endian-aware**: This crate properly handles translating between file and
|
||||
host endianness when parsing the ELF contents.
|
||||
|
||||
**Lazy parsing**: This crate strives for lazy evaluation and parsing when
|
||||
possible. For example, the `SymbolTable` simply acts as an interpretation layer
|
||||
on top of a `&[u8]`. Parsing of `Symbol`s takes place only when symbols are
|
||||
### Lazy parsing:
|
||||
This crate strives for lazy evaluation and parsing when possible.
|
||||
[File::open_stream()][File::open_stream] reads, parses and validates the ELF
|
||||
File Header, then stops there. All other i/o and parsing is deferred to
|
||||
being performed on-demand by other methods on [File]. For example,
|
||||
[File::symbol_table()](File::symbol_table) reads the data for the symbol
|
||||
table and associated string table then returns them with types like
|
||||
[SymbolTable](symbol::SymbolTable) and
|
||||
[StringTable](string_table::StringTable) which simply act as an
|
||||
interpretation layer on top of `&[u8]`s, where parsing of
|
||||
[Symbol](symbol::Symbol)s and strings take place only when they are
|
||||
requested.
|
||||
|
||||
**Tiny compiled library size**: At the time of writing this, the release crate
|
||||
was only ~30kB!
|
||||
### Lazy i/o:
|
||||
This crate provides two ways of parsing ELF files:
|
||||
* From a `&[u8]` into which the user has already read the full contents of the file
|
||||
* From a Read + Seek (such as a [std::file::File]) where file contents are read
|
||||
lazily on-demand based on what the user wants to inspect.
|
||||
|
||||
These allow you to decide what tradeoff you want to make. If you're going to be working
|
||||
with the whole file at once, then the byte slice approach is probably worthwhile to minimize
|
||||
i/o overhead by streaming the whole file into memory at once. If you're only going to
|
||||
be inspecting part of the file, then the Read + Seek approach would help avoid the
|
||||
overhead of reading a bunch of unused file data just to parse out a few things.
|
||||
|
||||
### No unsafe code:
|
||||
Many of the other rust ELF parsers out there contain bits of unsafe code
|
||||
deep down or in dependencies to reinterpret/transmute byte contents as
|
||||
structures in order to drive zero-copy parsing. They're slick, and there's
|
||||
typically appropriate checking to validate the assumptions to make that
|
||||
unsafe code work, but nevertheless it introduces unsafe code blocks (albeit
|
||||
small ones). This crate strives to serve as an alternate implementation with
|
||||
zero unsafe code blocks.
|
||||
|
||||
# Future plans
|
||||
|
||||
**Add no_std option**: Currently, the main impediment to a no_std option is the
|
||||
use of allocating datastructures, such as the parsed section contents' `Vec<u8>`.
|
||||
|
||||
**Lazily loading section contents**: Currently, all of the section data is read
|
||||
from the input stream into allocated `Vec<u8>` when the stream is opened. This can
|
||||
be unnecessarily expensive for use-cases that don't need to inspect all the section
|
||||
contents.
|
||||
|
||||
A potential future vision for both of these issues is to rework the parsing
|
||||
code's reader trait implementations to provide two options:
|
||||
|
||||
* A wrapper around a `&[u8]` which already contains the full ELF contents. This
|
||||
could be used for a no_std option where we want to simply parse out the ELF
|
||||
structures from the existing data without needing to heap-allocate buffers
|
||||
in which to store the reads.
|
||||
* An allocating CachedReader type which wraps a stream which can allocate
|
||||
and remember `Vec<u8>` buffers in which to land the data from file reads.
|
||||
|
||||
The former no_std option is useful when you need `no_std`, however it forces
|
||||
the user to invoke the performance penalty of reading the entire file
|
||||
contents up front.
|
||||
|
||||
The latter option is useful for use-cases that only want to interpret parts
|
||||
of an ELF object, where allocating buffers to store the reads is a much
|
||||
smaller cost than reading in the whole large object contents.
|
||||
**Add no_std option** This would disable the Read + Seek interface and limit
|
||||
the library to the `&[u8]` parsing impl.
|
||||
|
||||
## Example:
|
||||
```rust
|
||||
extern crate elf;
|
||||
|
||||
fn main() {
|
||||
let path: std::path::PathBuf = From::from("some_file");
|
||||
let mut io = match std::fs::File::open(path) {
|
||||
Ok(f) => f,
|
||||
Err(e) => panic!("Error: {:?}", e),
|
||||
};
|
||||
let path = std::path::PathBuf::from("some_file");
|
||||
|
||||
let elf_file = match elf::File::open_stream(&mut io) {
|
||||
Ok(f) => f,
|
||||
Err(e) => panic!("Error: {:?}", e),
|
||||
};
|
||||
let file_data = std::fs::read(path).expect("Could not read file.").as_slice();
|
||||
|
||||
println!("ELF: {}", elf_file.ehdr);
|
||||
let mut file = File::open_stream(file_data).expect("Could not parse ELF Header");
|
||||
|
||||
let text_scn = match elf_file.sections.get_by_name(".text") {
|
||||
Some(s) => s,
|
||||
None => panic!("Failed to find .text section"),
|
||||
};
|
||||
let (symtab, strtab) = file
|
||||
.symbol_table()
|
||||
.expect("Failed to read symbol table")
|
||||
.expect("File contained no symbol table");
|
||||
let symbol = symtab.get(30).expect("Failed to get symbol");
|
||||
let symbol_name = strtab
|
||||
.get(symbol.st_name as usize)
|
||||
.expect("Failed to get name from strtab");
|
||||
|
||||
println!("{:?}", text_scn.data);
|
||||
println!("{symbol_name}: {symbol}");
|
||||
}
|
||||
|
||||
```
|
||||
|
26
src/file.rs
26
src/file.rs
@ -8,19 +8,13 @@ use crate::{gabi, string_table};
|
||||
pub struct File<R: ReadBytesAt> {
|
||||
reader: R,
|
||||
pub ehdr: FileHeader,
|
||||
sections: section::SectionTable,
|
||||
}
|
||||
|
||||
impl<R: ReadBytesAt> File<R> {
|
||||
pub fn open_stream(mut reader: R) -> Result<File<R>, ParseError> {
|
||||
let ehdr = FileHeader::parse(&mut reader)?;
|
||||
let table = section::SectionTable::parse(&ehdr, &mut reader)?;
|
||||
|
||||
Ok(File {
|
||||
reader,
|
||||
ehdr,
|
||||
sections: table,
|
||||
})
|
||||
Ok(File { reader, ehdr })
|
||||
}
|
||||
|
||||
/// Get an iterator over the Segments (ELF Program Headers) in the file
|
||||
@ -143,10 +137,6 @@ impl<R: ReadBytesAt> File<R> {
|
||||
self.section_data_as_strtab(&strtab_shdr)
|
||||
}
|
||||
|
||||
pub fn sections(&self) -> Result<§ion::SectionTable, ParseError> {
|
||||
Ok(&self.sections)
|
||||
}
|
||||
|
||||
fn get_symbol_table_of_type(
|
||||
&mut self,
|
||||
symtab_type: section::SectionType,
|
||||
@ -680,12 +670,7 @@ mod interface_tests {
|
||||
let io = std::fs::File::open(path).expect("Could not open file.");
|
||||
let mut c_io = CachedReadBytes::new(io);
|
||||
let file = File::open_stream(&mut c_io).expect("Open test1");
|
||||
let bss = file
|
||||
.sections()
|
||||
.expect("Failed to get section table")
|
||||
.get_by_name(".bss")
|
||||
.expect("Could not find .bss section");
|
||||
assert!(bss.data.iter().all(|&b| b == 0));
|
||||
assert_eq!(file.ehdr.elftype, ObjectFileType(gabi::ET_EXEC));
|
||||
}
|
||||
|
||||
#[test]
|
||||
@ -694,12 +679,7 @@ mod interface_tests {
|
||||
let file_data = std::fs::read(path).expect("Could not read file.");
|
||||
let slice = file_data.as_slice();
|
||||
let file = File::open_stream(slice).expect("Open test1");
|
||||
let bss = file
|
||||
.sections()
|
||||
.expect("Failed to get section table")
|
||||
.get_by_name(".bss")
|
||||
.expect("Could not find .bss section");
|
||||
assert!(bss.data.iter().all(|&b| b == 0));
|
||||
assert_eq!(file.ehdr.elftype, ObjectFileType(gabi::ET_EXEC));
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
75
src/lib.rs
75
src/lib.rs
@ -4,51 +4,48 @@
|
||||
//!
|
||||
//! # Capabilities
|
||||
//!
|
||||
//! **Contains no unsafe code**: Many of the other rust ELF parsers out there
|
||||
//! contain bits of unsafe code deep down or in dependencies to
|
||||
//! reinterpret/transmute byte contents as structures in order to drive
|
||||
//! zero-copy parsing. They're slick, and that also introduces unsafe code
|
||||
//! blocks (albeit small ones). This crate strives to serve as an alternate
|
||||
//! implementation with zero unsafe code blocks.
|
||||
//! ### Endian-aware:
|
||||
//! This crate properly handles translating between file and host endianness
|
||||
//! when parsing the ELF contents.
|
||||
//!
|
||||
//! **Endian-aware**: This crate properly handles translating between file and
|
||||
//! host endianness when parsing the ELF contents.
|
||||
//! ### Lazy parsing:
|
||||
//! This crate strives for lazy evaluation and parsing when possible.
|
||||
//! [File::open_stream()][File::open_stream] reads, parses and validates the ELF
|
||||
//! File Header, then stops there. All other i/o and parsing is deferred to
|
||||
//! being performed on-demand by other methods on [File]. For example,
|
||||
//! [File::symbol_table()](File::symbol_table) reads the data for the symbol
|
||||
//! table and associated string table then returns them with types like
|
||||
//! [SymbolTable](symbol::SymbolTable) and
|
||||
//! [StringTable](string_table::StringTable) which simply act as an
|
||||
//! interpretation layer on top of `&[u8]`s, where parsing of
|
||||
//! [Symbol](symbol::Symbol)s and strings take place only when they are
|
||||
//! requested.
|
||||
//!
|
||||
//! **Lazy parsing**: This crate strives for lazy evaluation and parsing when
|
||||
//! possible. For example, the [SymbolTable](symbol::SymbolTable) simply
|
||||
//! acts as an interpretation layer on top of a `&[u8]`. Parsing of
|
||||
//! [Symbol](symbol::Symbol)s takes place only when symbols are requested.
|
||||
//! ### Lazy i/o:
|
||||
//! This crate provides two ways of parsing ELF files:
|
||||
//! * From a `&[u8]` into which the user has already read the full contents of the file
|
||||
//! * From a Read + Seek (such as a `std::file::File`) where file contents are read
|
||||
//! lazily on-demand based on what the user wants to inspect.
|
||||
//!
|
||||
//! **Tiny compiled library size**: At the time of writing this, the release lib
|
||||
//! was only ~30kB!
|
||||
//! These allow you to decide what tradeoff you want to make. If you're going to be working
|
||||
//! with the whole file at once, then the byte slice approach is probably worthwhile to minimize
|
||||
//! i/o overhead by streaming the whole file into memory at once. If you're only going to
|
||||
//! be inspecting part of the file, then the Read + Seek approach would help avoid the
|
||||
//! overhead of reading a bunch of unused file data just to parse out a few things.
|
||||
//!
|
||||
//! ### No unsafe code:
|
||||
//! Many of the other rust ELF parsers out there contain bits of unsafe code
|
||||
//! deep down or in dependencies to reinterpret/transmute byte contents as
|
||||
//! structures in order to drive zero-copy parsing. They're slick, and there's
|
||||
//! typically appropriate checking to validate the assumptions to make that
|
||||
//! unsafe code work, but nevertheless it introduces unsafe code blocks (albeit
|
||||
//! small ones). This crate strives to serve as an alternate implementation with
|
||||
//! zero unsafe code blocks.
|
||||
//!
|
||||
//! # Future plans
|
||||
//!
|
||||
//! **Add no_std option**: Currently, the main impediment to a no_std option is the
|
||||
//! use of allocating datastructures, such as the parsed section contents' Vec<u8>.
|
||||
//!
|
||||
//! **Lazily loading section contents**: Currently, all of the section data is read
|
||||
//! from the input stream into allocated Vec<u8> when the stream is opened. This can
|
||||
//! be unnecessarily expensive for use-cases that don't need to inspect all the section
|
||||
//! contents.
|
||||
//!
|
||||
//! A potential future vision for both of these issues is to rework the parsing
|
||||
//! code's reader trait implementations to provide two options:
|
||||
//!
|
||||
//! * A wrapper around a `&[u8]` which already contains the full ELF contents. This
|
||||
//! could be used for a no_std option where we want to simply parse out the ELF
|
||||
//! structures from the existing data without needing to heap-allocate buffers
|
||||
//! in which to store the reads.
|
||||
//! * An allocating CachedReader type which wraps a stream which can allocate
|
||||
//! and remember Vec<u8> buffers in which to land the data from file reads.
|
||||
//!
|
||||
//! The former no_std option is useful when you need no_std, however it forces
|
||||
//! the user to invoke the performance penalty of reading the entire file
|
||||
//! contents up front.
|
||||
//!
|
||||
//! The latter option is useful for use-cases that only want to interpret parts
|
||||
//! of an ELF object, where allocating buffers to store the reads is a much
|
||||
//! smaller cost than reading in the whole large object contents.
|
||||
//! **Add no_std option** This would disable the Read + Seek interface and limit
|
||||
//! the library to the `&[u8]` parsing impl.
|
||||
//!
|
||||
|
||||
pub mod file;
|
||||
|
Loading…
x
Reference in New Issue
Block a user