Skip to Content
Python Machine Learning, Second Edition - Second Edition
book

Python Machine Learning, Second Edition - Second Edition

by Sebastian Raschka, Jared Huffman, Vahid Mirjalili, Ryan Sun
September 2017
Intermediate to advanced content levelIntermediate to advanced
622 pages
15h 13m
English
Packt Publishing
Content preview from Python Machine Learning, Second Edition - Second Edition

Dealing with class imbalance

We've mentioned class imbalances several times throughout this chapter, and yet we haven't actually discussed how to deal with such scenarios appropriately if they occur. Class imbalance is a quite common problem when working with real-world data—samples from one class or multiple classes are over-represented in a dataset. Intuitively, we can think of several domains where this may occur, such as spam filtering, fraud detection, or screening for diseases.

Imagine the breast cancer dataset that we've been working with in this chapter consisted of 90 percent healthy patients. In this case, we could achieve 90 percent accuracy on the test dataset by just predicting the majority class (benign tumor) for all samples, without ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Machine Learning - Third Edition

Python Machine Learning - Third Edition

Sebastian Raschka, Vahid Mirjalili
Python Machine Learning

Python Machine Learning

Sebastian Raschka

Publisher Resources

ISBN: 9781787125933Supplemental Content