Doing networked video ASCII art in Python

Today we’ll see how to send an ASCII art-styled webcam video stream over the network using Python!

This article is also available on Medium.

A few months ago, I came across this video by Micode (in French) where the young youtuber presented his 24h-project on how to transform the stream from his webcam into animated ASCII art – so instead of a normal video stream, you get a sequence of images only composed of characters, like this:

I really liked the idea and I decided to reproduce it; then, I took it one step further and added a little server/client sockets logic so I could send my webcam stream to my other computer 😉

ASCII art: using characters to draw images

What is ASCII art?

ASCII art is old, but it’s still amazing and quite trendy in the geek world. It’s about displaying images that are not composed of pixels but of ASCII characters, in other words composed of plain old text characters. What you depict can be more or less complex, it can be just a word (where letters are made of letters, yay!), a simple geometric figure or even a pretty realistic image. The whole thing relies on the fact that if you print enough characters with a somewhat small font size next to each other, then you’ll feel like it’s a (nearly) continuous splash of pixels and therefore your eye will be able to catch the shapes, fill in the blanks, and reconstruct the full image.

Wikipedia’s logo in ASCII art – By Wonsoh (https://en.wikipedia.org/wiki/File:Wikipedia-Ascii.png)

Depending on the exact type of ASCII art, the final image might be more or less “realistic”: some ASCII images focus more on simplifying the object than reproducing it exactly, while others have want more details to simulate a full-size pixel-precise render.

There is even animated ASCII art which displays two or more ASCII images one after the other, similarly as a real video feed, to create a movie of ASCII art renders:

“Roflcopter”: an animated ASCII art version of a helicopter – By Lewksaft (https://en.wikipedia.org/wiki/File:Roflcopter.gif)

In this tutorial, I won’t talk about colored ASCII art even though there are fabulous results with this technique too! So we’ll focus on grayscale images where pixel colors can only go from black to white.

My initial source of inspiration for the code (which is also Micode’s by the way) is this great Github repo by uvipen: Python ASCII Generator. This tool can convert images or videos to ASCII art, it works with colored or grayscale inputs and it can use several ASCII tables that each correspond to different languages (like the English alphabet, the Japanese characters, the Korean symbols…).

How to do ASCII art?

Now, how do you do that? How can you transform a “normal” image into ASCII art? The two important concepts to keep in mind when you do ASCII art are that:

  • images are composed of W x H pixels, where W is the width of your image and H is its height
  • each pixel (in a grayscale image) only has a gray value, so it can be represented by an integer between 0 (black) and 255 (white)

Ultimately, this means that an image is a 2D array of integers from 0 to 255. For example, this super-basic (and super-small) image can be represented by the integers array below:

[
  [  0, 255, 128, 128],
  [  0,   0, 255, 255],
  [255, 255, 255,   0],
  [128,   0,   0, 255]
]

Of course, the more pixels you have, the less “blocky” and the “smoother” the result looks – this is called the resolution of your image. But the more pixels you have, the larger your image is, because it contains a lot more data! ASCII art is nice in that regard because you can significantly simplify your ASCII image (and therefore the number of “pixels”) in it without worrying too much about it loosing its smoothness… since you’re limited to the size of a character anyway 😉

To turn an image into its exact ASCII art equivalent, meaning we put one character for one pixel, the process is pretty straight-forward:

  • you go through each pixel of the original image
  • we get its gray value
  • we map it to a corresponding ASCII character

This mapping requires us to define a range of ASCII characters that are more or less “full”. Think of characters like the # or the @: those fill the space and when repeated, they’ll provide a dense splash of contrast color:

##########
@@@@@@@@@@

On the other hand, the space character or a dot are way more “empty”, in the sense that they fill less space when repeated:

          
..........

Depending on the surface you’re painting your ASCII art on and the color of your text, the “full” characters will of course be more white or more black. With a light background and dark text, “full” characters are the most black and “empty” characters are the most white; with a dark background and light text, it’s reversed:

The more ASCII characters you have in your mapping, the more precisely you can distinguish the different levels of gray in the image. If you only take two characters (like . and #), you’ll only be able to represent pure black and white, so you’ll need to drastically round all gray values to those extremes. At the other end of the spectrum, if you take 255 characters, you’ll map each gray value to one character.

I myself find that having a bit less than 10 characters already gives good ASCII art… in this project, my palette is:  .:-=+_?*#%@ 🙂

At this point, you can already write a small script that directly converts an image to its exact ASCII art equivalent. We saw that images can be represented as 2D arrays, so we’ll use the famous numpy package to manipulate this multidimensional matrix easily:

Here is the result – from left to right, we have the original picture (the photo of tomatoes by Mockup Graphics on Unsplash), its grayscale version and its ASCII version… that my screen can’t fully display!

We immediately note three things on the results of this script:

  • the ASCII images it produces are very large and pretty much impossible to display in a terminal
  • but they’re very detailed and, in fact, we hardly see that it uses ASCII characters
  • the image seems to be deformed vertically

The reason images are so large is simply because an ASCII character is way larger than a pixel (since it’s itself composed of lots of pixels on your screen). The result is deformed vertically because ASCII characters’ bounding boxes aren’t squares but rectangles – a character’s height is roughly twice its width. We can already fix this issue by dividing the output image height by 2 and only taking even rows in the initial image:

At least, now, the image is properly scaled! 🙂

But it’s still large – so it’s heavy to compute and complicated to display on a normal screen. For a one-shot like our tomato example, where we only compute one image at a time, it’s sweet because it gives a detailed result. But when we have a video, we’ll want its frames to be displayed somewhat in real-time. So this computing time will become a nightmare!

Compressing our ASCII art

A better way of transforming our image is to “compress” it: we’re not going to replace each pixel by one character, but rather we’ll take small squares of pixels, compute the average value, and replace all this bunch of pixels by one ASCII character. This will greatly reduce the number of rows and columns in our final ASCII result while maintaining an OK resemblance to the original image.

To compress our ASCII art, we apply a grid on our image and we average the pixels in each cell (extracted from Micode’s video: https://www.youtube.com/watch?v=DBnStqiLB-Q)

Note: this operation where we reduce a matrix by cutting it down in blocks and regrouping blocks as a single value is very frequent in machine learning. It’s known as “pooling”; most notably, “mean pooling” (what we’re doing here) and “max pooling” (we take the max value in the block to represent it) are building blocks of convolutional neural networks (CNNs).

To do that, we’ll decide beforehand how many rows and columns our final ASCII image will have by defining the size ratio to apply to the original image. So a ratio of 0.5 (or 50%) means that our image will have half as many characters in a row as there are pixels in the original image; and of course, only a quarter of the pixels in height, to keep our vertical deformations-fix.

Then, in our make_ascii() function we won’t iterate on the width and height but on these new values, rows and columns. And instead of taking the gray value of the original pixel directly, we’ll make a mean of all pixels in the neighbourhood:

We can modify the ratio parameter to get a different level of detail. A ratio of 10% means that you’ll get an image 10 ten times smaller than the original one but with quite crude shapes whereas a ratio of 100% is the fully detailed image like before. Here is our tomato image with ratios of 10, 20 and 50% (all images have been rescaled to fit the same bounding box but of course increasing the ratio requires you to adapt the display):

I think it’s incredible that even with the 10% ratio, we already get an idea of the objects on the picture – our humain brain is really excellent at inferring shapes and filling in the blanks! 😉

Programming video using the OpenCV package

When you want to work with images or videos, a famous lib you can use is OpenCV. While the first versions were mainly maintained for C and C++, this package has now been nicely adapted for plenty of programming languages, among which Python with the opencv-python module.

It’s a very powerful lib that lets you read, write, transform, compress, resize and filter images in lots of ways. And since a video is only a sequence of images, or frames, OpenCV can also be useful for videos! It even has some useful I/O methods that let us directly get the computer’s webcam input stream – which will be crucial for our tutorial 🙂

Here is an example of a very basic Python script that uses the OpenCV lib to display the webcam video stream:

This program works like that:

  • first, we get our capture stream: VideoCapture(0) refers to our computer’s webcam stream
  • then, we have an infinite loop:
    • we keep on reading new frames from the camera with cap.read()
    • and we print it to our display window with cv2.imshow()
    • the cv2.waitKey(1) listens to keyboard events and determines the framerate of our application: if we get a key press before 1 millisecond has elapsed, we check the second condition; else, we continue our loop
    • this means that our program will (try to) refresh every 1 millisecond
    • and if we press a key that is ‘q’, then we quit the program by terminating the infinite loop
  • the final lines do some clean up by releasing the OpenCV video stream and display window

To write our ASCII art image instead of the initial webcam stream frame, we’ll simply need to change the cv2.imshow() line and instead call our ASCII process on the current frame, and print the result to the terminal.

So here’s the adapted script that starts the video stream from the webcam and then continuously gets frames, processes them using the ASCII converter we prepared and prints them in the terminal:

I added some checks at the beginning of the video capture to check everything’s ok, and I also prepred some argument parsing with Python’s built-in package argparse in the main process. This allows me to easily pass in an additional argument to my command-line when I run the program to change the ratio of the outputted ASCII images:

python video_streamer.py -r 0.2

The default ratio is 10% so I have a light and live ASCII-video stream, but it can be increased to your liking (and depending on the performances of your computer too…).

Preparing the networked version

For now, our video streamer automatically outputs to the current terminal. In our project, we want to be able to send this data over the network, using one computer as the server and another one as the client. So we better make sure that this printing function is configurable – or more precisely, that we can pass in a custom processing function to our video streamer to apply on each ASCIIed frame.

Python’s perfect for that since we can pass in functions just like any other variable type. We can also pass optional keyword arguments (or “kwargs”) that are directly transferred to the given processing function 🙂

When we call the video streamer script directly, we use the print function by default, so it will behave just like it did before. But we’re now able to pass this data through another processing pipeline and retrieve a flag to check if there was no error during processing.

Sending the data over a socket!

We can now wrap up this project and add a little server/client pair of scripts to it – this will allow us to send the ASCII-processed frames from one server computer to one or more client computers!

To do this part, I looked at some StackOverflow threads about sending OpenCV data over a network, like this one. The basic idea is as follows:

Server-side

  • we open a new socket on a given host address and port
  • we wait for clients to connect (up to 5, in my example)
  • as soon as one has arrived, we start the live webcam-stream
  • we apply our ASCII processing to each frame
  • each time we get a new processed frame, we send it to the clients using a pickle object with a struct packing

Client-side

  • we try to connect to the server with the given host address and port
  • as soon as we’ve connected, we continuously wait for new frames from the server
  • whenever we get a new frame, we print it to the terminal

To allow for multiple clients at the same time, we need to do multi-threading using Python’s built-in threading package. Creating and using sockets is doable with another Python built-in module: socket.

In the server script, we’re going to have one thread per client, and each thread will run a function called handle_client(). This method simply calls the run_ascii_capture() method we created previously with a custom process function, so that when a frame is finished processing it is not printed to the terminal but sent over the network to the matching client (see the process_frame() function). It also updates the list of current connections so we now if we can still accept more clients or not.

In the client script, we’re just trying to connect to the server socket and then have an infinite loop that never stops listening to the server. Each time we get a new packet of data, we store it. Because packets can be longer than our socket maximum data transfer size, we have a little nested while loop that makes sure we listen long enough to have our entire data packet, i.e. our entire frame. It just concatenates the data it receives until the current data block has reached the known data packet size. Otherwise, we’d print half-finished frames!

When we have the full frame data, we simply print it to the terminal, and then go back to our “waiting for next frame state”.

Note: remember that sockets receiving function is blocking which means that the program stops and stays at that instruction until it is done executing – so, until the client has indeed received data. This is why if, for some reason, your server can’t send the data to the client, your client program will simply run forever without doing anything…

We can also re-use the argparse module to read our command-line arguments more easily 😉

This gives us the following Python scripts:

Server-side

Client-side

We can run this two scripts either on two computers or (during your tests) in two terminal windows:

# start the server
python server.py --host 0.0.0.0 --port 12000 --ratio 0.2
# start the client
python client.py --host {HOST_ADDR} --port 12000

Where HOST_ADDR is the IP address of the computer you are running the server on.

Boosting the performance of our encoder

At the moment, the whole project works well, but we have a little delay on the client side that is a bit distracting. Moreover, it has been shown (by Schoenenberg and al. in this article, for example, already in 2014) that video chats drain our energy partly because of the delay (see this recent article by M. Jiang). Let’s try and reduce this delay as much as we can!

You might think that the issue is with sending data over the network (and of course if your Internet connection is slow, it will be hard to really solve this issue). But, in truth, if you run your video streamer purely locally (so the same computer produces and prints the images), you’ll also that the delay is already here! So it means that the problem is mainly with our frame processing function.

For now, we’re doing everything manually – it’s great for learning, but it’s often not optimized. Now the adage: “dont’t reinvent the wheel”? Well, for a big boost in the performance of our ASCII-frame transformer (the make_ascii() function), we can use the scikit-image module, a Python package that specializes in image operations and that has a very useful function in its measure submodule called: block_reduce(). As explained in the method’s docblock_reduce() is basically an optimized implementation of mean/max pooling – so exactly the operation we want to perform on our image to “compress” it.

Note: just like numpy, the scipy lib and other modules that derive from them are fast and efficient implementations of matrix operations. They actually use C code under the hood instead of pure Python: that’s why they are so much faster! 😉

We can use it in our make_ascii() function to greatly reduce the computing time. We can also use numpy‘s vectorize() function to efficiently apply our gray-to-ASCII character mapping on every cell of the reduced frame:

Remember we still need to apply our division on the vertical dimension to avoid having a deformed image.

Conclusion

It was really fun to work on this project! Micode’s video was a really nice way of getting me started, so big thanks to him 😉

This project could be a nice way of creating a lightweight video chat system, with a geek-vibe. Plus, it would avoid any security issue because it would allow you to easily conceal your identity (you’d appear as a more or less detailed blob of characters)… In this day and age where remote work and video conferences are becoming the norm, the right to privacy is a rising concern. Video chats can be intrusive or a source of stress and there are numerous research teams that are studying the effects this new mean of communication has on our health.

Could an ASCII video-chat avoid some of these bad effects?

Let me know what you think in the comments!

References
  1. Python ASCII Generator by uvipen: https://github.com/uvipen/ASCII-generator
  2. Micode’s Youtube video (in French): https://www.youtube.com/watch?v=DBnStqiLB-Q
  3. K. Schoenenberg, “Why are you so slow? – Misattribution of transmission delay to attributes of the conversation partner at the far-end” (https://www.sciencedirect.com/science/article/abs/pii/S1071581914000287), May 2014. [Online; last access 05-May-2021].
  4. M. Jiang, “The reason Zoom calls drain your energy” (https://www.bbc.com/worklife/article/20200421-why-zoom-video-chats-are-so-exhausting), Apr. 2020. [Online; last access 05-May-2021].

Leave a Reply

Your email address will not be published. Required fields are marked *