Introduction to Artificial Intelligence - AI
In 1996 Garry Kasparov loses a chess match to a purpose-built computer system designed to be the first computer “intelligent” enough to defeat a human player in a game of Chess. At that time it was thought that we are still years away from having a computer perform so well, but after delivering a top notch game, and making a number of accidental moves (later described as “bugs”) the computer was able to shake the world class player and begin what Kasparov later described as the replacement of “knowledge workers”. While that happened decades ago, even today AI is still considered a fringe technology to some extent with problems like adversarial attacks and constant progress and setbacks at a startup pace rather than an established industry. The lack of mainstream adoption is especially striking considering the technology has evolved generations since Deep-Blue’s use of alpha-beta pruning algorithm, used in search engines.
Recently I was tasked with studying the current technology available, its applications\limitations and weaknesses and provide a writeup on how to go about an assessment that will gauge the robustness of real world implemented system. After doing my research, I conceived these 2 articles that will cover the topic as an introduction to artificial intelligence and an exploration of a specific weakness known as adversarial attacks and how to defend against it. The second half is a topic not to be confused with “hardening your environment using AI” which has become THE buzzword at vendor presentations but offers very little in steps how to secure your AI driven implementation.
To understand what modern Artificial Intelligence is, one must first understand what it is designed to do. The problem AI is meant to solve is one of “complex” decision making. A human being can take in many individual bits of information, and using all of them, be able to create a mental image of the problem and its unique aspects. Using experience, a person can come up with a solution to such a problem with a very high rate of success. In contrast, a computer faces several challenges attempting to do the same thing. If a problem has an intractably large variable space (such as chess) or worse it is always unique such as video analysis of real world events, the software must deal with an unknown number of inputs, each potentially having an unknown input type or range, and be able to boil that information down to a define list of outputs (such as a name lookup or a go/no-go decision). Add to that the need to make decisions at lightning speed, and you understand what AI is meant to do.
It’s clear then that the challenge is tied to data, and whatever the solution is, it will be one of data analyses. Therefor the first thing to understand about AI is that it is a subcategory of a larger general data analysis field, and will perform poorly or well, as a direct result of the quality of data it receives. While that concept may be universal, for the purpose of this article, I will be using some specific examples of inputs. Medical diagnostics, facial recognition and autonomous driving vehicle inputs, three very common use-cases these days. These three are similar in many ways, there is a lot of “data” available for each type, and people are also able to collect this data quickly and accurately.
However, there are also differences between the three. For example, a medical device designed to detect a problem will be tuned for highest fidelity with the highest rate of false positives since it relies on a doctors second set of eyes, in contrast facial recognition will probably be more tolerant to false negatives since a second attempt is just around the corner.
But how does a computer decide based on real world samples? A computer doesn’t need to “understand” a concept in the same way a person would, but they do follow a similar problem solving process by repeatedly breaking down a complex question into smaller and easier questions until they can be answered with a level of certainty (threshold). Looking at the chart below, we can see that a scatter plot shows data falls within a rough range, and a red threshold line can be put higher or lower to tune higher rate of false positives, but lower rate of failed detection. While linear regression is not generally considered an example of AI, the concept of a PASS/FAIL threshold is best illustrated by this chart.
For a real-life example of a facial recognition system, we can look at the geometric measurement of a human face, using pre-defined facial characters being matched to a pre-measured criteria database. Below is a graphic that shows these steps, demonstrating the callback of measurements to a facial database. The steps are to locate the face in a larger image, and collect measurements of the geometry of various details. Once these steps are complete, a comparison to existing records, with a defined margin of error should give you an accurate identification. You can read more at the source articles sited below.
With enough data, accurate labeling of the records and proper tuning, the same process can be used to analyze any set of criteria. One could use this approach to detect anomalies network protocol behavior, diagnose medical issues and perform a search through online records looking for specific financial or professional activity. But this model breaks down when you attempt to provide inputs from an environment with limitless variability and no clear definitions. For example, attempting to detect the presence of a cat in a random video, or understanding commands given with a random accent.
For that we go to the latest in machine reasoning, Neural Networks.
A Neural network is the go-to tool for solving problems that cannot be programmed. One might think of them as problems that humans require more experience, or “intuition”, to sift through a lot of noise. The way this process works is that the input is broken into smaller problems, and those get broken again to smaller problems still, going through layers that make decisions solving each problem and routing outputs to any number of possible secondary or tertiary layer “neurons”. The magic of this approach is that the neurons and layers get created by themselves as a result of the training and data inputs, as opposed to being pre-designed by a human being.
The chart below illustrates the concept of going from one layer into the next, routing the decision based on what each neuron decider figures out. If more than 3 layers are used, this model is often called a Deep Neural Network (DNN) an its depth is only limited by resources and time.
I’ve come across a very good visual from the website Brilliant.org that illustrates this logic well, and you can see it here (3).
This approach is likely the most flexible and has a wide range of use cases ranging from funny (2) to fairly dystopian (4). However, it is also the least efficient and most training intensive. In addition to the higher initial training cost, it also brings with it some security gaps in the form of adversarial attacks, which we will discuss more at length in the second article.
1 - Jain, Anil & Klare, Brendan & Park, Unsang. (2012). Face Matching and Retrieval in Forensics Applications. IEEE MultiMedia. 19. 20. 10.1109/MMUL.2012.4.
Author - Imanuel is a manager of IT security and operations with 15+ years in infosec and IT. He held both technical and leadership roles in the medical, banking and government industries mostly trying to focus on technology, computer security and latest security trends. He holds a masters degree in computer science and a number of security certificates from SANS, ISC2 and more. For hobbies, he pretends to be a good woodworker and spends his days being a proud dad making smores with the kids over burning failed projects he never liked anyway.