Skip to content

Virtio net#643

Open
jounathaen wants to merge 7 commits intomainfrom
virtio-net
Open

Virtio net#643
jounathaen wants to merge 7 commits intomainfrom
virtio-net

Conversation

@jounathaen
Copy link
Copy Markdown
Member

@jounathaen jounathaen commented Feb 19, 2024

This is a rebase and rework of #536 done by @BaderSZ

Original description:

This a somewhat-functional implementation for virtio-net. I'll follow up with comments, tests and likely a few examples for RustyHermit soon.

Fixes: #1
Fixes: #1162

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 19, 2024

Codecov Report

❌ Patch coverage is 82.89054% with 161 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.38%. Comparing base (c5d5737) to head (a11593b).
⚠️ Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
src/virtio/net.rs 82.00% 81 Missing ⚠️
src/pci.rs 68.42% 30 Missing ⚠️
src/params.rs 4.34% 22 Missing ⚠️
src/linux/x86_64/kvm_cpu.rs 80.00% 9 Missing ⚠️
src/virtio/pci.rs 84.61% 6 Missing ⚠️
src/linux/x86_64/virtio_device.rs 94.87% 4 Missing ⚠️
src/net/tap.rs 95.00% 4 Missing ⚠️
src/vm.rs 85.00% 3 Missing ⚠️
src/bin/uhyve.rs 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
+ Coverage   75.99%   81.38%   +5.38%     
==========================================
  Files          27       31       +4     
  Lines        4033     4642     +609     
==========================================
+ Hits         3065     3778     +713     
+ Misses        968      864     -104     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jounathaen jounathaen marked this pull request as draft February 19, 2024 16:28
@jounathaen
Copy link
Copy Markdown
Member Author

If I get it right, using virtio-queue basically forces us to use virtio-mem as memory backend. (makes #645 obsolete).

@jounathaen jounathaen force-pushed the virtio-net branch 8 times, most recently from ff445af to abe8633 Compare February 19, 2024 18:10
@BaderSZ
Copy link
Copy Markdown
Contributor

BaderSZ commented Mar 28, 2024

Some bugs were found when it comes to handling BARs: size/mode aren't handled correctly. I'll document progress and fix on BaderSZ#2

@github-actions github-actions bot added feature/security Concerns security-related behavior, soundness, isolation or reliability. tests Integration and unit-tests verifying Uhyve's behavior labels Mar 12, 2026
@jounathaen
Copy link
Copy Markdown
Member Author

jounathaen commented Mar 12, 2026

Blocked by hermit-os/kernel#2326

@jounathaen jounathaen force-pushed the virtio-net branch 3 times, most recently from e682d18 to 4ca871c Compare March 13, 2026 11:12
@jounathaen jounathaen force-pushed the virtio-net branch 3 times, most recently from 8eff073 to ae9e118 Compare March 13, 2026 12:57
@jounathaen
Copy link
Copy Markdown
Member Author

Yay! Ready to merge!
@fogti @n0toose (@BaderSZ), do you want to take a look?

let mut uhyve_rw_paths: Vec<PathBuf> = vec![PathBuf::from("/dev/kvm")];
let mut uhyve_rw_paths: Vec<PathBuf> = vec![
PathBuf::from("/dev/kvm"),
#[cfg(target_os = "linux")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary, Landlock is only available on Linux.


fn r#continue(&mut self) -> HypervisorResult<VcpuStopReason> {
loop {
let virtio_device = || self.peripherals.virtio_device.lock().unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitively looks like a rebasing/merge mistake.

It should probably look like:

let virtio_device = || self.peripherals.virtio_device.map(|vd| vd.lock().unwrap());

}
match port {
//TODO:
// Legacy PCI addressing method
Copy link
Copy Markdown
Member

@n0toose n0toose Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this: If this is legacy, is this necessary for v2? (I think it's probably independent but could be made not to be that way. It would be best if that were moved into a separate function, similarly to the hypercall handling.)

}
VIRTIO_PCI_QUEUE_PFN => {
virtio_device().write_pfn(&addr, &self.peripherals.mem);
self.pci_addr = Some(unsafe { *(addr.as_ptr() as *const u32) });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAFETY comment that explains why this is necessary would be appreciated.

let err = io::Error::other(format!("{debug:?}"));
return Err(err.into());
}
VcpuExit::MmioWrite(addr, data) => match addr {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anything in particular motivate moving this further down?

Copy link
Copy Markdown
Contributor

@fogti fogti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall, but some decisions like parametrizing VmPeripherals over the entire virtualization backend, or a lack of documentation of large, new APIs feel worthy of improvement.

config: Some(PathBuf::from("config.txt")),
#[cfg(feature = "instrument")]
trace: Some(PathBuf::from(".")),
net: Some(String::from("tap10")),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format doesn't look like it conforms to the format mentioned in the command line description above. Shouldn't it be "tap:tap10"?

Also, you should probably use Some("tap10".to_string()) instead.

pub struct KvmVm {
vm_fd: VmFd,
peripherals: Arc<VmPeripherals>,
pub(crate) vm_fd: VmFd,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad idea because this hampers the ability to maintain cross-platform compatibility long-term if platform differences are spread too much across the codebase.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be fixed by "gating" whatever needs to access vm_fd behind a function for VirtualizationBackendInternal, which we intend to do more for gdb cross-platform compatibility. We can stub the function declaration in XhyveCpu with unimplemented/unreachable in the meantime.


fn r#continue(&mut self) -> HypervisorResult<VcpuStopReason> {
loop {
let virtio_device = || self.peripherals.virtio_device.lock().unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitively looks like a rebasing/merge mistake.

It should probably look like:

let virtio_device = || self.peripherals.virtio_device.map(|vd| vd.lock().unwrap());

PciConfigurationAddress(pci_addr & 0x3ff),
addr,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not do anything if we can't read? I would at least expect a warn!ing.

}
}

/// Thin Wrapper around `EventFd` to implement `VirtQueueInterrupter`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge the two wrappers into one, that just implements both traits?

};
queue.set_ready(stat);
// we'll need to set if we're enabling, as queue is_valid will return false
// the queue is disabled
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if" missing?

}

#[cfg(target_os = "linux")]
pub(crate) fn start_network_threads<
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure this only gets called once the DeviceStatus::DRIVER_OK is set, or what's the intended call order?

I think the intended interaction should be documented.

}

#[allow(dead_code)]
fn reset_interrupt(&mut self) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is neither used nor implemented, nor documented, I think this method should just be removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a placeholder for some sort of future work, I'd explain what the intention is/could be in a comment.

DeviceLow,
}

enum ThreadStartMsg {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum ThreadStartMsg has a weird name. Perhaps use ThreadControlMsg instead?

Comment on lines +108 to +112
pub(crate) struct VmPeripherals<VirtBackend: VirtualizationBackendInternal> {
pub file_mapping: Mutex<UhyveFileMap>,
pub mem: MmapMemory,
pub mem: Arc<MmapMemory>,
pub(crate) serial: UhyveSerial,
pub virtio_device: Mutex<VirtioNetPciDevice>,
pub virtio_device: Option<Mutex<VirtBackend::VirtioNetImpl>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) struct VmPeripherals<VirtBackend: VirtualizationBackendInternal> {
pub file_mapping: Mutex<UhyveFileMap>,
pub mem: MmapMemory,
pub mem: Arc<MmapMemory>,
pub(crate) serial: UhyveSerial,
pub virtio_device: Mutex<VirtioNetPciDevice>,
pub virtio_device: Option<Mutex<VirtBackend::VirtioNetImpl>>,
pub(crate) struct VmPeripherals<Virtio> {
pub file_mapping: Mutex<UhyveFileMap>,
pub mem: Arc<MmapMemory>,
pub(crate) serial: UhyveSerial,
pub virtio_device: Option<Mutex<Virtio>>,

This would avoid dragging the entire VirtualizationBackend around everywhere, and make this struct work more similarly to VirtualCPU.

Copy link
Copy Markdown
Member

@n0toose n0toose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another pass, took a bit but I hope it's all helpful.

let fd = OpenOptions::new()
.read(true)
.write(true)
.open("/dev/net/tun")?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about that: Is it possible to turn this into a const? Would be nice for Landlock, as we wouldn't have to split things across the codebase.

let res =
unsafe { tun_set_iff(fd.as_raw_fd(), &config_str as *const ifreq as u64).unwrap() };

if res == -1 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you completely sure that this is correct? If I'm not misunderstanding anything, the function can return more than just -1 (also watch out for the err = register_netdevice(tun->dev)): https://elixir.bootlin.com/linux/v6.19.8/source/drivers/net/tun.c#L2692-L2831

You could check whether the result is negative, I think. Optionally, you could present more detailed errors depending on the error.

Tangentially relevant example:

uhyve/src/hypercall.rs

Lines 240 to 260 in c5d5737

/// Translates the last error in `errno` to a value suitable to return from the hypercall.
fn translate_last_errno() -> Option<i32> {
let errno = io::Error::last_os_error().raw_os_error()?;
// A loop, because rust can't know for sure that errno numbers don't overlap on the host.
macro_rules! error_pairs {
($($x:ident),*) => {{[ $((libc::$x, hermit_abi::errno::$x)),* ]}}
}
for (e_host, e_guest) in error_pairs!(
EBADF, EEXIST, EFAULT, EINVAL, EIO, EISDIR, EOVERFLOW, EPERM, ENOENT, EROFS
) {
if errno == e_host {
return Some(e_guest);
}
}
warn!(
"No Hermit equivalent of host error {} (errno: {errno}), returning default to guest...",
io::Error::from_raw_os_error(errno)
);
None
}

/// Vendor-specific PCI capability.
/// Section 4.1.4 virtio v1.2
#[derive(IntoBytes, Clone, Copy, Debug, Immutable)]
#[repr(C)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit skeptical about this (together with CfgType's #[repr(u8)]) because of the use of padding and mix of repr's, but I'm not 100% positive on what else to recommend:https://doc.rust-lang.org/nomicon/other-reprs.html

#[repr(C)]
pub struct NotifyCap {
pub cap: PciCap,
/// Combind with queue_notify_off to derive the Queue Notify address
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line under line 191

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, typo?

/// All data should be treated as little-endian.
#[derive(IntoBytes, Clone, Copy, Debug, Immutable)]
#[repr(C)]
#[allow(dead_code)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add reason =


/// Provides the empty but linked datastructures for VirtioPCI. See module level description for the internal memory layout.
pub fn new() -> Self {
let mut h: Self = Default::default();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be initialized as:

Self {
  pci_config.hdr.capabilities_ptr = Self::COMMON_CAP_START,
  // insert rest here
  ..Default::default(),
}

warn!("PciConfigurationAddress not at word boundary");
}

if address < IOBASE || address >= IOEND {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this function is only is used to display information when panicking (when an MmioRead takes place outside of IOBASE_U64..IOEND_U64), what information will be shown when a panic takes place? I feel like this is method is some sort of a noop, if I'm not missing anything?

https://github.com/hermit-os/uhyve/pull/643/changes/BASE..a11593bfc417e8a71fc95cb3485704c83894b074#diff-6ca3166bfc834f52b037b258a244b66773daf1e31305c590e5ae5ed70766487bR15-R21

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively: Wouldn't it be best to make such a check part of the impl Add<usize> instead, if we assume that a PciConfigurationAddress should never, under any circumstance, "escape" this range?

It might be a bit redundant, but could prevent future bugs or something like that.

/// as IO/MMIO writes are otherwise dismissed.
// pub const IOBASE: u64 = 0xFE000000;
pub const IOBASE_U64: u64 = 0xFE000000;
pub const IOBASE: GuestPhysAddr = GuestPhysAddr::new(IOBASE_U64);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two declarations of pub const IOBASE, one here and one at the end of src/virtio/mod.rs with a different data type (u32) compared to the GuestPhysAddr with an underlying u64.

/// For now, use an address large enough to be outside of kvm_userspace,
/// as IO/MMIO writes are otherwise dismissed.
pub const IOBASE: u32 = 0xFE000000;
const VIRTIO_MSI_NO_VECTOR: u16 = 0xffff;

}

pub fn guest_address(&self) -> GuestPhysAddr {
IOBASE + self.0 as u64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a complaint, but I generally feel a bit uneasy when not using methods for adding addresses, I think we generally do that for a limited amount of cases.

For ASLR, I explicitly used add to signal that implicitly (followed by a checked_sub).

uhyve/src/vm.rs

Line 181 in c5d5737

(guest_address, guest_address.add(KERNEL_OFFSET))

I think that it is worth clarifying that this is only used when initializing new objects, i.e. malicious input from the guest does not come at play.

https://github.com/hermit-os/uhyve/pull/643/changes/BASE..a11593bfc417e8a71fc95cb3485704c83894b074#diff-af2b0cf9b680aa3f499f6e47b1ac24010fb51479ac57e6e0d33001747c2411b5R75-R82

if address < IOBASE || address >= IOEND {
return None;
}
Some(Self((address - IOBASE) as u32))
Copy link
Copy Markdown
Member

@n0toose n0toose Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked_sub? (see review comment above)

// }
// pub fn read_lower(&self) -> u32 {
// self.address as u32
// }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover comments

// The precise timestampe can be important when debugging networking,
builder.format_timestamp_nanos().try_init().ok();

let bin_path = build_hermit_bin("network_test", BuildMode::Debug);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if this could be changed to invoke run_vm directly (after #1302 is merged), perhaps through another helper. The env_logger_build() in the irrelevant PR might have to be adjusted.

@fogti
Copy link
Copy Markdown
Contributor

fogti commented Mar 14, 2026

Also, the MTU should be read from the interface information, right?

@n0toose
Copy link
Copy Markdown
Member

n0toose commented Mar 15, 2026

Also, the MTU should be read from the interface information, right?

Yes, it's accessible over the sysfs, but I also think it's completely fine if we get this merged and implement that later. I'd presume that it's "sufficient" enough for now, as long as the value from the Linux default is decoupled (healthy practice or so).

@jounathaen
Copy link
Copy Markdown
Member Author

Thank you both for the extensive feedback! 🙂
I'll get back to it the next days!

data.copy_from_slice(&[offs, 0]);
}

pub fn write_selected_queue(&mut self, data: &[u8]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is #[inline] used in the functions above but not e.g. here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/security Concerns security-related behavior, soundness, isolation or reliability. tests Integration and unit-tests verifying Uhyve's behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove tuntap crate add basic virtio support

4 participants