By Daryna Pesina, COO @It-Jim
886

If you want to dig into Computer Vision (CV) but have no idea where to start, this beginner guide is for you. Here we recommend some sources which will come in handy for learning and understanding both the computer vision and deep learning basics. 

When you search for a position of computer vision engineer, you’re likely to see that companies are looking for a candidate with:

  • digital image processing understanding and knowledge of classical computer vision algorithms,
  • background in mathematics,
  • sufficient skills in programming (Python and C++ are the most required),
  • knowledge of main libraries for classical CV (like OpenCV and Numpy for Python),
  • machine learning / deep learning (ML/DL) understanding,
  • knowledge of main ML/DL libraries (like TensorFlow, Keras, PyTorch)
  • experience.

Let’s now go step by step and see how and where to cover each item from the list above:

Digital Image Theory and Processing Methods

Do you know what a digital image is? How the color pixels are formed?  Have you heard about color spaces, histograms, image filters, and convolution? The video course on digital image processing presented by Prof. Guillermo Sapiro (Duke University) will be a good starting point if you answered ‘No’ to those questions. You can also check the Digital Image Processing tutorial, which is pretty simple but covers a lot. As for the books on the topic, one of the best ones is “Digital Image Processing” by Rafael Gonzalez and Richard Woods. Another book by Ian Young et al. explains the fundamentals of digital image processing and is freely available.  As for classical computer vision algorithms, Richard Szeliski’s book “Computer Vision: Algorithms and Applications” is quite comprehensive and has its free draft version available online. Want to dive into the geometry of image formation, projective transformations, or multi-view geometry? Try the course by the University of Pennsylvania on Coursera or “Multiple view geometry” book by Richard Hartley.A hint: Often tutorials on digital image processing use OpenCV examples to gain practical knowledge, so learning this topic might be useful along with exploration of the OpenCV itself (see our recommendations in #4).

Do I Need to Know Maths for Computer Vision?

When it comes to Maths, you will need linear algebra, calculus, and probability theory. Most likely, you studied them at the university. The good news is that it should be enough. Yet, refreshing the knowledge is always a good idea: an Immersive Math interactive book and video explanations of basic math concepts can help you with this. A nice overview of possible mathematical areas that can be of use for CV is given here. You can always refer to that material if you need a cheat sheet.

What Programming Language Is Needed for Computer Vision?

If you use C++, keep going, but Python is the most requested programming language in CV/ML/DL . It is easy-to-learn, powerful, and great for CV, ML, and DL tasks. Learn everything from the ground up or level-up your skills with Real Python. There are plenty of free tutorials, structured links to useful resources, and video courses available. An extensive online tutorial from Python developers is another great option to master this skill.

The knowledge of the Numpy library basics is a must-have among your skills. It is used for numerical data preparation and processing. There is a short example-based tutorial to start with. If you prefer video tutorials, check Learn NUMPY in 5 minutes.

OpenCV Is a Must

Make this open-source computer vision and machine learning software library your best friend. There are plenty of tutorials, you can start with this post to dig in, for example. A comprehensive guide on most of the functions is available as an OpenCV tutorial webpage where you can go on learning digital image processing with examples. You can always check the Learn OpenCV blog for some implemented projects.

Machine Learning and Deep Learning Libraries

Learning ML/DL libraries is useless without theory knowledge. We suggest you start by trying to understand the theory behind the ML algorithms and neural networks first and then implement it with code. Here, it would be a mistake not to mention the classics: Machine Learning course by Andrew Ng on Coursera, The Deep Learning book by Ian Goodfellow.  An online book on Neural Networks and Deep Learning by Michael Nielsen may help you, too. Just a kind warning: these are not for kids, maths formulas inside! Stanford University is also offering a couple of extensive lecture series online: Computer Vision (with deep learning) and Convolutional Neural Networks for Visual Recognition. Last, but not least, a recent course from New York University by Yann LeCunn overviews the latest techniques in deep learning and is available both in video and text formats.

Once you have mastered the basics of neural networks and their main parameters to use, it’s time to do some coding. There are two main ways to follow here: using TensorFlow [with Keras inside] from Google or PyTorch from Facebook. Knowing both of them would give you a couple of extra points, of course. Both PyTorch and Tensorflow websites offer quite comprehensive tutorials. To dive into TensorFlow even deeper, try the Hands-On Machine Learning book by Aurélien Géron. An awesome blog PyImageSearch by Adrian Rosebrock can help you a lot. Oldie but goodie AI Shack also counts. Finally, a technical blog of SicaraAI will give you examples of real CV projects.

Find a Trainee Program in Computer Vision

Now it’s time for practice! If you want to benefit the most, try searching for an internship position or a trainee program. In any case, there are a lot of examples and test datasets on the net, basically on websites from the previous item. You can always enter the competition on Kaggle, collaborate with other engineers to solve real-life problems and get a chance to practice before being employed in the real-world. Try to implement some solutions to have your pet-projects to show on job interviews and jump on board, apply for a position in a CV/ML/DL company!

Well, what else?… Let’s cover some useful tools that can ease your study:

  • Jupyter and Google Colab

When learning online you can meet the examples or tasks in Jupyter notebooks (wiki) and its online Google colab version. Practically coding there is a bit different from what is usually done in IDE. Knowing the concept of such notebooks could be helpful.

  • Git / GitHub / Bitbucket or other version control system

Git now is a standard of a version control system, which is useful not only for professional programmers but helps a lot to download examples from the net, share your projects with others, and demonstrate your experience on job interviews. You should learn the basic terminal commands and understand what’s going on. Modern IDEs usually implement Git commands in their GUI and take care of the routine tasks. 

  • Integrated development environment (IDE)

We recommend the PyCharm free community version. It is ok to use simple text editors at first, but you will need more options further. It seems more reasonable to start using IDE and learning its options step by step than switching to IDE when you suddenly realize that your favorite text editor slows down your work.

Conclusion

It’s 2021.  AI keeps pushing boundaries and entering new and new areas. The demand for computer vision/deep learning engineers is very likely to keep increasing. Get prepared for this future today 😉

The Ultimate Guide to Developing Skills as a Computer Vision Engineer