• Blog
  • Image Classification With Audio Data

Image Classification With Audio Data

April 9, 2020
· 3 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Clouddata science, and more.

Recently, I ran my own POC to determine if I could use the new image classification capabilities of DataRobot to classify images sourced from audio files. I used sound files that were recorded in clinical settings for this proof of concept. Specifically, I took audio from patients with either normal or abnormal heartbeats and changed it into spectrograms (image files), then used DataRobot to classify the images (heartbeats) as Normal or Abnormal.

In this blog, I describe how to take sound data and make it ready for image classification. You can find the data for my POC here. You can also find code for this use case in this Community GitHub repo.

The images below illustrate the visual differences between spectrograms for a normal heartbeat versus an abnormal heartbeat (in this case, caused by a murmur).

Normal Heartbeat Spectrogram


Abnormal Heartbeat Spectrogram (murmur)


Create spectrograms of data for DataRobot

I only wanted to run this once to get the images, so I commented out the code. (Make sure to replace “/folder/” with the actual location of your WAV files.) Running this results in PNG files with the same filenames as the WAV files. You will find these in the same folder as the WAV files. This is the visual representation of your sound data.

#spectrogramFolder("/folder/", htmlPlots = TRUE, verbose = TRUE, step = NULL, overlap = 50, wn = "gaussian",
# zp = 0, ylim = NULL, osc = TRUE, xlab = "Time, ms",
# ylab = "kHz", width = 900, height = 500, units = "px",
# res = NA) 

Set up folders

Create a folder for your training data and create subfolders for each class. Move the images for each class into the correct subfolder for the related class. Compress the whole training folder into a ZIP file. Create a test folder and move the test images into that folder. You then compress that folder into a ZIP file as well. You can upload these zipped folders of images directly into DataRobot for training and testing.

Below is an example of what the training image file (and its subfolders) should look like before you zip it. Remember to create two ZIP files: one for the training dataset and one for the testing dataset.


Run the project in DataRobot

First, you set up the project similar to any other classification project.

Upload the zipped training file and type in “class” for the target.


You can look at the images before you run Autopilot as well!!



The Leaderboard populates in the same way as it does for other types of data. I decided to optimize on logloss for this classification problem.


Blueprint for the best model

The best model in this case was a tuned Light Gradient Boosted Trees Classifier.


Global Confusion Matrix for each class

The model did pretty well at identifying the classes. If I was a clinician, I would rather have a lot of false positives than false negatives, for the sake of pathology.

  • In the case of the “normal” heartbeat recording, you can see an F1 score of 0.91 and a very high recall (0.98) and precision (0.86).
  • In the case of the “murmur,” the heartbeat recording has an okay F1 score (0.69). The recall is around chance (0.59) and the precision is high (0.83).
  • In the case of the “extrasystole” heartbeat recording, the F1 score is (0.86), while the recall is (0.75) and the precision is (1).


I uploaded the zipped prediction file and calculated the results. Then, I downloaded them and renamed the dataset to “scores.csv.”

Pred <- read.csv('scores.csv')
Actual <- read.csv('scoreB.csv')

Pred$pred <- pmax(Pred$Prediction.extrasystole, Pred$Prediction.murmur, Pred$Prediction.normal)

Pred$row_id <- NULL

Pred$Class <- colnames(Pred)[max.col(Pred,ties.method="first")]
Actual$Class <- colnames(Actual[, 2:4])[max.col(Actual[, 2:4],ties.method="first")]

Pred$Actual <- tolower(Actual$Class)

Pred$Class <- str_sub(Pred$Class, 12, str_length(Pred$Class))

table(Pred$Actual, Pred$Class)

Concluding Remarks

This POC demonstrated that it is possible to use DataRobot to classify spectral images of sound.

Computer vision solutions are becoming more and more prevalent. The ability to automate the classification of spectrograms opens up a new range of opportunities for the DataRobot Community.

Visual AI on sound links:


See the reference materials for including images in your DataRobot project
Learn more

About the author
Linda Haviland
Linda Haviland

Community Manager

  • Listen to the blog
  • Share this post