The Linux Foundation Collaboration Summit is an exclusive, invitation-only summit gathering core kernel developers, distribution maintainers, ISVs, end users, system vendors and other community organizations for plenary sessions and workgroup meetings to meet face-to-face to tackle and solve the most pressing issues facing Linux today. If your company is not a member of The Linux Foundation and you are interested in joining please visit our website to learn more about how you can become a Corporate Member.
In virtual environment, many guests are running on one hypervisor and reliability of KVM hypervisor is really important. One of the key features is ""hardware error handling."" In order to minimize area of influence when hardware error, such as Machine Check, is detected, isolating hardware with a failure, shutting down only affected guest, are required. As for hardware error handling of Linux, there are three key features: pre-failure detection, failure isolation, continuity after isolation. These features are generally implemented in upstream kernel, however some important issues are still unresolved.This presentation will show the current implementation of the three key features, detail of unresolved issues, and current activities to solve those issues will be explained. Target audience is kernel developers who are interested in reliability of virtual environment.