How to Leverage Machine Learning in Cybersecurity
For many years, researchers have been trying to figure out how to leverage machine learning in the field of cybersecurity. The fruits of that research are finally being realized in the form of enterprise-grade defense systems. That is, systems that can automatically learn and adapt to the ever-evolving landscape of cyber threats. This transition is evidenced by the exponential increase of artificial intelligence (AI)-based cyber security startups. Not to mention, traditional vendors in this space have also begun to commercialize machine learning-based security products.
As our culture increasingly relies on smartphone applications, there is a growing chance that a major cyber breach could affect all of us.
Major cybersecurity breaches are growing in number and severity. High-profile attacks in 2017, such as the Equifax breach, touched millions of people in the U.S. The breach resulted in the compromise of more than 140 million customers’ sensitive information. The data included full names, addresses, Social Security numbers, dates of birth, and even driver’s license numbers.
Computer systems make our lives so much easier, but a tradeoff is that almost everyone’s personal information resides in a database somewhere – meaning we’re all at risk for these types of breaches. Banks, credit card providers, telecom companies, and many others spend vast amounts of time and money protecting data, but the risk can never be reduced to zero.
Financial institutions aren’t the only businesses targeted by malicious actors leading to widespread identity theft. As our culture increasingly relies on smartphone applications, there is a growing chance that a major cyber breach could affect all of us. Just a few months ago, the ride-sharing titan Uber, currently valued at tens of billions of dollars, was the victim of a sophisticated attack that exposed the personal information of 57 million users and drivers. The hackers were able to steal this sensitive information without compromising Uber’s internal infrastructure by targeting a GitHub repository, which Uber’s engineers had been using to store source code. This type of dependency between businesses means that even if your primary service provider has the strongest cybersecurity infrastructure possible, your data may still not be safe.
The Internet of things (IoT) will drastically increase the number of gadgets we need to protect in the years to come. Home automation, self-driving cars, and fleets of drones delivering Amazon packages right to our doorsteps will require robust cybersecurity solutions. Often the massive leaps forward — like autonomous vehicle technology — aren’t matched by equal progress in security. Every year at DEFCON, sobering examples of a skilled hacker’s ability to cause massive disruption are put on display. Just this year, hackers were able to compromise a Tesla Model S and disable its brakes using the onboard Wi-Fi system. Of course, this vulnerability was immediately fixed, but an AI system could have helped prevent a malicious user from accessing the vehicle’s internal systems in the first place.
Advancements in the machine learning algorithms themselves, coupled with increasing access to cheap computing power, have led to an explosion of solutions.
Early machine learning approaches to cybersecurity resulted in many false alarms, overwhelming security analysts and leading some security practitioners to develop the belief that traditional methods are best. This is no longer the case: machine learning-powered solutions are now more accurate and constitute an important piece of a defense-in-depth strategy.
Advancements in the machine learning algorithms themselves, coupled with increasing access to cheap computing power, have led to an explosion of solutions. Machine learning has been used for intrusion detection, botnet detection, malware analysis, and the fusion of cyber threat intelligence; researchers have even used neural networks to crack passwords.
Automated machine learning tools can make that process very simple so that more time can be spent on feature engineering, which is far more valuable with this type of data.
This is a perfect time for machine learning security practitioners to leverage the advancements DataRobot has made in automated machine learning. It is very difficult to find a data scientist with the machine learning, stats, and programming background who also has a deep understanding of cybersecurity. A good security analyst understands how vulnerabilities can be exploited at each level of the OSI network model, which requires a solid understanding of the communication protocols and software their enterprise uses. They must also be capable of correlating events in massive amounts of packet capture data.
These security professionals have a tremendous depth of domain expertise, making them ideal candidates to use an automated machine learning tool like DataRobot. Our goal is to democratize data science, making the most important ingredient to produce a good model domain expertise, rather than data science expertise.
DataRobot is making strides in its anomaly detection capabilities and will be including advanced time-series analysis soon.
Because security data is so complex and subtle, malicious patterns may be spread out over time, making it hard to choose the proper algorithm and its associated parameters. DataRobot’s automated machine learning tools can make that process very simple so that more time can be spent on feature engineering, which is far more valuable with this type of data.
Good machine learning solutions complement firewalls, anti-virus software, and security analysts — learning from them to become more effective. Automated machine learning is the future, and will empower security researchers to build better solutions, faster.
About the Author
Kembey Gbarayor is the General Manager for Cyber Security at DataRobot. He works closely with Product, Marketing, Sales, and Client Facing Data Scientists to deepen and expand DataRobot’s customer base and partnerships in the cybersecurity industry. A computer scientist by training, his background is in machine learning, quantitative finance, and cybersecurity. His professional experience spans the commercial and federal sector, with previous roles at Goldman Sachs, government intelligence agencies, and as Chief Technology Officer of a cyber data science startup.