Setting Up Python for Machine Learning on Windows

This Post Was Originally Published on Real Python on Oct 31st, 2018 by Renato Candido.

Python has been largely used for numerical and scientific applications in the last years. However, to perform numerical computations in an efficient manner, Python relies on external libraries, sometimes implemented in other languages, such as the NumPy library, which is partly implemented using the Fortran language.

Due to these dependencies, sometimes it isn’t trivial to set up an environment for numerical computations, linking all the necessary libraries. It’s common for people to struggle to get things working in workshops involving the use of Python for machine learning, especially when they are using an operating system that lacks a package management system, such as Windows.

In this article, you’ll:

  • Walk through the details for setting up a Python environment for numerical computations on a Windows operating system
  • Be introduced to Anaconda, a Python distribution proposed to circumvent these setup problems
  • See how to install the distribution on a Windows machine and use its tools to manage packages and environments
  • Use the installed Python stack to build a neural network and train it to solve a classic classification problem
Read More

Pure Python vs NumPy vs TensorFlow Performance Comparison

This Post Was Originally Published on Real Python on May 7th, 2018 by Renato Candido.

Python has a design philosophy that stresses allowing programmers to express concepts readably and in fewer lines of code. This philosophy makes the language suitable for a diverse set of use cases: simple scripts for web, large web applications (like YouTube), scripting language for other platforms (like Blender and Autodesk’s Maya), and scientific applications in several areas, such as astronomy, meteorology, physics, and data science.

It is technically possible to implement scalar and matrix calculations using Python lists. However, this can be unwieldy, and performance is poor when compared to languages suited for numerical computation, such as MATLAB or Fortran, or even some general purpose languages, such as C or C++.

To circumvent this deficiency, several libraries have emerged that maintain Python’s ease of use while lending the ability to perform numerical calculations in an efficient manner. Two such libraries worth mentioning are NumPy (one of the pioneer libraries to bring efficient numerical computation to Python) and TensorFlow (a more recently rolled-out library focused more on deep learning algorithms).

  • NumPy provides support for large multidimensional arrays and matrices along with a collection of mathematical functions to operate on these elements. The project relies on well-known packages implemented in other languages (like Fortran) to perform efficient computations, bringing the user both the expressiveness of Python and a performance similar to MATLAB or Fortran.
  • TensorFlow is an open-source library for numerical computation originally developed by researchers and engineers working at the Google Brain team. The main focus of the library is to provide an easy-to-use API to implement practical machine learning algorithms and deploy them to run on CPUs, GPUs, or a cluster.

But how do these schemes compare? How much faster does the application run when implemented with NumPy instead of pure Python? What about TensorFlow? The purpose of this article is to begin to explore the improvements you can achieve by using these libraries.

To compare the performance of the three approaches, you’ll build a basic regression with native Python, NumPy, and TensorFlow.

Read More

Using Python Fabric to automate GNU/Linux server configuration tasks

Fabric is a Python library and command-line tool for automating tasks of application deployment or system administration via SSH. It provides tools for executing local and remote shell commands and for transferring files through SSH and SFTP, respectively. With these tools, it is possible to write application deployment or system administration scripts, which allows to perform these tasks by the execution of a single command.

Read More

E-mail backup with NoPriv.py

Listening to the Linux Action Show podcast, I heard about a Python script to backup e-mails from IMAP accounts that downloads messages and attachments and offers everything on a nice local HTML page.

The script is called NoPriv.py and can be found on this page and on this Github repository. Given an IMAP e-mail account and a list of folders to be copied, it creates the HTML files structure to access the messages as shown in this demo page. The backup can be made incrementally, transfering only the new messages each time the script is run.

Read More

Python for scientific applications – the very basics

Lately, I’ve been studying a little about Python and the more I learn the more I become interested about this language. It was the first time I had contact with a language that uses dynamic typing and since the first time I used it, I had the impression it could be a good replace for Matlab. I like Matlab, but the fact of being proprietary and not being a general-purpose language makes Python a better solution for me.

Read More