* Or just the age of machine learning algorithms

- Dear Readers,
- Wait, what Is Machine Learning?
- Bad Data is a Problem
- What about bad algorithms?
- So, where was I?
- Interview with ChatGPT
- What does this mean for the workplace?
Dear Readers,
Have you heard of ChatGPT? There’s a good probability that you have as it’s been making the rounds in major news outlets lately. I had largely ignored it until very recently when it came up in a specific conversation at work. For anyone that exists outside of the tech bubble, though, it probably helps to have some kind of explanation. Perhaps, a primer on what is machine learning?
Let’s ask ChatGPT:

Nailed it.
Wait, what Is Machine Learning?

At the most basic level, let’s just assume you are collecting pictures of cats all day. Maybe you have a cat cafe and accept uploaded pictures of cats from your customers. I don’t know why you are doing this, just stay with me.
You, the human, could stare at these pictures all day and confirm they have cats in them and approve them individually. This sounds pretty boring.
Like the humans we are, we decide – let’s make a machine do it. But how exactly do you tell a machine to identify pictures of cats? “Four legs, 2 pointy ears, furry, cute.”
Hold up, this isn’t a cat. So you take a step back, think about it, and ask the question: How would I train a human to do this task?
You typically don’t have to explain to another human person what a cat looks like. Except the brand new humans that are still learning about the world.
What would you tell the brand new human?

I have a couple tiny humans and from what I recall, you might tell them the broad strokes but then you show pictures. “Cat, cat, cat, cat, raccoon, cat, cat, cat, cat, dog, cat, cat…” Over time, the human brain begins to understand the characteristics of “cat” and now the brain can identify cats even when it hasn’t seen that specific cat before.
So that’s the super basic concept. Super basic. And having this basic understanding might lead you to a bunch of other questions: “What if the algorithm is wrong?” or “What happens if the data is wrong?” Yeah, well, this happens. Even before machine learning became the hotness, the cybersecurity field was dealing with the bad data problem.
Bad Data is a Problem

So, this is not a cat. If this were included in training data for a cat identifying algorithm, we might label it as “not cat”. You might label it as “fish” or depending on your location “food”.
However, if you were to label this as “cat” and feed many of them into our cat identifying algorithm, your cute cat cafe will end up with an aquarium full of unwanted fish pics.
Bad data is not always this obvious though. And sometimes data is not bad, it’s just wrong.
When you are specifically dealing with non-visual machine data, like the kind you see when dealing with security stuff, it becomes a lot of guess work. A simple example: Imagine you are setting up a new algorithm to find suspicious logons to systems in your environment. So you pull out all the machine data (all the logs from all those systems) because you’ve been storing it up. You take the last 30 days of data and feed it into an algorithm and say, “This is what normal looks like.“
You trained your algorithm on this data making a very basic assumption: Everything is normal.
Some time later, you discovered a compromise in your environment. One of your user accounts has been doing all sorts of not-very-nice things. Your machine learning algorithm didn’t spot it like the sales people said it would! What has gone wrong?

You started with a bad assumption and then fed data with that bad assumption into the machine. Really, it’s just a reminder that you can’t get away from the fundamentals. You have to baseline and fully understand your environment before you can teach someone or something else how to monitor that environment.
There’s also something to be said here about bias in datasets. If your data set on cats only includes black cats, and no other types of cats, you stand a strong possibility of excluding some cats from your cat cafe picture website. It really is just like a human. If you aren’t given good information when you are learning, you’ll spend your whole life with bad ideas.
Anyway…
What about bad algorithms?
Given that algorithms are created by humans for the purpose of allowing a machine to learn, the algorithms are only as good as the people that created them. Also, here’s a bit of a thought experiment using the cat cafe example above. Try creating a series of steps that would tell someone else how to identify cats in a picture. Odds are that your set of steps will be slightly different than my set of steps. Therefore, our outcomes might differ considerably.
I don’t want to spend considerable time on this. Bad algorithms aren’t nearly as much of a problem as bad data and, really, I got so far off track here.
So, where was I?
Oh, right, I started writing this post because someone at work used ChatGPT. They used it asking how to solve a problem for their job. Or put differently, they asked the machine to do their job for them. And that got me thinking, is this a problem?

It made me wonder, how many times have I interviewed someone and they pump my questions through ChatGPT or a similar system and just read it back to me? How many times has one of my coworkers been asked a question and has used this system to generate an intelligent sounding response?
And is any of this okay?
Interview with ChatGPT
As an example, I often ask candidates to explain how the Domain Name System (DNS) protocol works. It’s a simple question and anyone with experience in the security field should be able to answer this question in detail. So I asked ChatGPT…


Another question that I like to ask for cloud-focused positions: Is AWS Lambda truly serverless?
Honestly, this just proves to me that none of the candidates are asking ChatGPT because this answer is far better than any I’ve ever gotten.
I would have accepted: “No.”
So far, ChatGPT is acing the interview. Let me ask it something more complex, something very specific about a specific technology.

I’ve been working with Splunk for a very long time. I usually cite something like ten years of experience with this specific technology but it’s probably been longer. When I interview someone that says they understand Splunk, one thing I go for is asking how indexer operations change between “traditional” implementations and “smartstore” enabled implementations.
In 99% of the interviews I’ve conducted, no one can explain to me the difference.
Did ChatGPT get it right? Frankly, if a candidate answered my question like this, I would have said, “Good enough.”
(Just in case you were wondering what I am looking for in that question, I’m specifically looking for the candidate to understand that hot bucket processing hasn’t really changed. SmartStore changes indexer operations extensively but it’s very key to understand that data is not immediately stored in the cloud – it lives on the indexer first and is uploaded after the hot bucket is closed.)
Anyway, ChatGPT didn’t mention index buckets and all that but… I have to say, it would have passed the interview.
What does this mean for the workplace?
I’m not sure where my colleagues fall on this subject, but I think this is great. Really.
On the one hand, there’s a fear that this could result in replacing some of our jobs. Why would you ask me these basic questions if you can ask a robot and get a good answer without the sarcasm? Or, frankly, without the salary requirements.
On the other hand, this system could save me a lot of time and effort. It’s like a dictionary that I can talk to. I can run all my interview questions through it just to see what kind of response I can expect from something that’s done a lot of research.

Of course, it’s all about perspective. I’m in favor of using this system to enhance your work but you have to know what questions to ask and knowing the right question comes from knowledge and experience. So add this tool to your toolbox and ask it all the questions you want. The machine uprising will probably happen someday, but we’re not there yet.
The reality is that it’s still just a computer program. It’s obvious when you ask:

Only something programmed by a human would respond that it could not have opinions or feelings because, today, only humans are arrogant enough to think that this is the truth.
Until next time,
JL
You must be logged in to post a comment.