Topics
-
Ray: A System for High-performance, Distributed Python Applications
By: Dean Wampler
Experience Level: Intermediate
Length: 30 Minutes
Description:Ray is a framework for distribution and scaling of clustered, high-performance, Python applications. It is used in several ML/AI systems and production deployments. This talk explains the problems that Ray solves, including rapid execution of “tasks” and management of distributed state, such as model parameters during training. I’ll use several example applications to illustrate. You'll learn when and how to use Ray in your projects.
-
SLIs, SLAs, SLD’OHs! Learning About Service Uptime from Homer Simpson
By: Mason Egger
Experience Level: Novice
Length: 30 Minutes
Description:Building services is important, but what happens after they are built and running in production? How do we establish trust with our customers that our service will actually be available? Who creates these definitions and how do we measure them? Service Level Indicators (SLI), Agreements (SLA), and Objectives (SLO) are central to an operations mindset and foundational tools for effective Site Reliability Engineering. This talk will take you on a journey through Springfield as we discuss exactly what SLIs, SLAs, and SLOs are, how to measure them, what targets should be measured, how to define uptime, availability, and acceptable error rates, and what happens when they are breached. Attendees will leave with a clear understanding of how to monitor and report for their services, how SLIs, SLAs and SLOs can aid in this process, and how to implement them within their own teams.