1. Can you give a quick overview on machine learning, and why you were so interested in learning more about it?
I've always been interested in computer science. I first heard about machine learning when Chris Williams (Class of 2017) presented a project last spring on his work in the field, and it really intrigued me. Essentially, machine learning is a branch of computer science that studies and creates algorithms to discern patterns and trends from data. For example, every time you put a search term into Google, or Facebook, or YouTube – the recommendations these sites give you are based on these types of algorithms. The machine "learns" what you like, and tweaks itself to represent what it thinks your interests are.
2. How did you get started in your independent study?
I took Ms. Hudson's Software Programming I course in spring 2016, which taught me Python. So I looked for an online course that would give me an introduction in machine learning and was taught in Python. I found one on the educational platform Udacity, which was co-founded by Sebastian Thrun, a former vice president of Google. Thrun actually taught my class, along with data scientist Katie Malone.
Every time you put a search term into Google, or Facebook, or YouTube – the recommendations these sites give you are based on these types of algorithms. The machine "learns" what you like, and tweaks itself to represent what it thinks your interests are.
3. What specifically did you learn?
I studied how algorithms work, methods to make them run through datasets quicker and more efficiently, and how to write my own. Thrun and Malone introduced the most commonly applied types of algorithms - supervised learning and unsupervised learning - to provide a good foundation on machine learning. They also talked about the best ways to manipulate data and make it manageable to better work with the algorithms. Supervised learning is when you give a machine inputs and their corresponding outputs, so it can learn a rule or function that predicts the outputs of inputs it hasn't seen before. For example, you could tell your machine that a projector (input) is heavy (output) and that a feather (input) is light (output), and then you would ask it to predict if a backpack is heavy or light. In unsupervised learning, you only provide the machine with inputs, and the algorithm finds the structure or relationship between your inputs on its own. Each algorithm has its own strengths and weaknesses, and is most effective in different situations. It's up to the programmer to come up with the correct choice, to make sure the algorithm is most efficient in finding patterns for their particular set of data.
Computers can discern patterns from large sets of data faster than humans, but humans are still needed to discern meaning from whatever those patterns are.
4. How is your senior project an example of applying machine learning?
My senior project is the final project for the Udacity class. I'm applying what I learned to "real data," in this case, the huge datasets of emails and financial information from Enron. Enron was one of the 10 biggest companies in the U.S. in 2000, but by 2002, thanks to the actions of a few people in the company who committed corporate fraud, it was bankrupt. I'm trying to use machine learning to determine who actually committed fraud. To do this, I picked algorithms to determine who is a "person of interest (POI)," and who isn't. We know some POIs already because we know who was indicted, settled with the government, or testified for immunity. That data is used to help the machine learn what these people have in common. The traditional methods of looking at this data searched for individual cases of fraud, and so people needed to find evidence or witnesses to verify each one. Machine learning lets you analyze trends or patterns in much larger datasets, such as all company emails, to give you leads that can then be verified with traditional methods. Computers can discern patterns from large sets of data faster than humans, but humans are still needed to discern meaning from whatever those patterns are.
5. In what other fields can you apply the principles of machine learning?
The applications of machine learning, of being able to use machines to find patterns in large sets of data, are far-reaching, and growing every day. I worked in a lab at the University of Pennsylvania and used machine learning to analyze how certain proteins influence the alternative splicing of genes– it can easily be used in other facets of biology and medicine, too. Machine learning is also used to analyze traffic patterns, to match it up with political and demographic data. It's huge in the world of artificial intelligence – since it's teaching machines how to "think" and discern patterns on its own, with minimal human direction. It's also used in finance, to help analyze trades in the stock market. Imagine looking at all that data without an algorithm!
- The Big Room Blog