Small Big Data: using NumPy and Pandas when your data doesn't fit in memory

Tips and tricks for dealing with larger-than-memory data on a single computer

May 17, 2023

Please join PyData Pittsburgh for a special presentation of the talk Small Big Data: using NumPy and Pandas when your data doesn't fit in memory by Itamar Turner-Trauring. We'll gather in person at Code & Supply, and Itamar will join us via video link from Cambridge, MA.

About the talk

Your data is too big to fit in memory—loading it crashes your program. There's no need to switch to a complex Big Data cluster just yet, though! Much of the time you can process your data simply and quickly with your existing tools running on a single computer.

In this talk you’ll learn the basic techniques for dealing with larger-than-memory data, on a single computer: money, compression, batching, and indexing. You’ll specifically learn how to apply these techniques to NumPy and Pandas, but you’ll also learn the key concepts you can apply to other libraries and the specifics of your particular data.

About the speaker

Itamar Turner-Trauring is the creator of Sciagraph, a performance and memory profiler for Python data science, scientific computing, and other data processing. He has worked on multimedia CD-ROMs, an airline reservation system, spatial gene sequencing, and more. He writes about Python performance and Python Docker packaging at https://pythonspeed.com.

Small Big Data: using NumPy and Pandas when your data doesn't fit in memory

Tips and tricks for dealing with larger-than-memory data on a single computer

Discussion about this post