The Data Driven Revolution: An Analysis of Big Data, AI, and Machine Learning

Dec 27, 2023

____________________________________________________________________________

Pretext: The following research paper was written in competition for Palo Alto Network's Secure the Future event (2023).

____________________________________________________________________________

Q: Describe the relationships between Big Data, AI and Machine Learning.

Artificial intelligence is one of the biggest trends of the year; almost everyone has heard of it. Many people hold their own opinion on it: what it can do, what it could be capable of, and whether we should even continue developing it. However, most people don’t understand how it works under the hood. AI is intricate and a wonder once you understand its inner workings. The foundation of an entire model begins with big data. Large quantities of data are first cleansed, ensuring that all information is correct. Then, in some cases, the data is organized in groupings, such as vector graphing. The company Pinecone describes how this works: “vector embeddings are a type of data representation that carries within its semantic information that’s critical for the AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks.” (Pinecone, 2023) The AI model is then trained on this data, understanding connections and applying its own prior knowledge. This is also where machine learning starts to take place. Machine learning is a subset of AI; it focuses on developing algorithms and models. This allows the AI to make better predictions and decisions. It also works on its own, handling much of what we see every day. Tableau, a company of Salesforce, sees ML to be found in facial recognition, product recommendations, spam filtering, and social media optimizations. It is used by all sorts of companies, from Netflix’s watch next to Facebook’s picture tagging (Tableau, 2023). To compare this, artificial intelligence utilizes these algorithms to mimic human intelligence. We see it more commonly in virtual personal assistants and autonomous robots. It requires more complex problem solving to be able to take in input, understand it, and determine the proper response. Big Data, artificial intelligence, and machine learning are all very different from each other. Each one is crucial to the next step in the chain. As we progress forward, each sector will need to be separately innovated so they may sync together for the better.

Q: Explain the core differences between Strong AI and Weak AI. How could strong AI pose a future security threat?

Diving deeper into artificial intelligence, we can realize that it itself is a broad topic. There are multiple models of different sizes and use cases. We can see weaker AI models that are designed to do one thing really well, such as chess. Codemotion speaks on one of these algorithms: “AlphaZero became famous for its quick mastery of the game after only a few hours of self-play training” (Codemotion, 2023). It is impressive to see it master chess, though it becomes futile in the face of other challenges. They also tend to fail at learning other domains, being programmed to only excel at a small set of things. Every day we continue to see advancements, and AI algorithms become stronger and more capable. Stronger AI models are becoming more prevalent, such as GPT-4. This model is designed to be able to take input and respond with proper information. It started as purely a text-based chatbot but has now grown to recognize images. Other text-based chats, such as Bing Copilot, can generate images within chat. Now that’s not all; their ability to understand this input continues to progress. GPT-4 is able to understand funny images; it can take a meme as input and explain the joke. Not only is its comprehension of language growing, but so is its understanding of code. It can see a picture of a website and generate the code to create it. Its neural networks strengthen by the day because their growth is driven by data; the more they ingest and process, the more sophisticated and nuanced their understanding becomes. This comes at great benefit to human growth, but it can also cause many issues. There are many companies right now working to ensure that AI is ethical and controlled. There are many threats to AI, and that is the root reason behind the movement to stop development. AI is constantly learning from each prompt it receives or when it’s given new data. This poses a risk if it were to be given private data. It would be very difficult to remove data from the model once it has learned it. Additionally, AI allows cyberattacks to be more resilient. It can modify data to pass filters and learn as it executes attacks. There are multiple other fields of worry as well. Misinformation, social engineering, and catfishing will only become more efficient and widespread. These threats are recognized and plans to mitigate them are already in the works. Our technology will always grow to mitigate these threats, and we will see new innovations very soon. “Today’s “best practices” lead to dead ends or technical debt; the best paths are new and untried” (Rohit Agnihotri, 2023).
————————————————————————————————————-

References:

Schwaber-Cohen, R. (n.d.). Vector database for vector search. Pinecone.
https://www.pinecone.io/
Salesforce. (n.d.). Real-world examples of machine learning (ML). Tableau. https://www.tableau.com/learn/articles/machine-learning-examples
Degni, R. (2023, March 1). The ultimate checkmate: Ai and chess engines. Codemotion. https://www.codemotion.com/magazine/ai-ml/the-ultimate-checkmate-ai-and-chess-engines/#:~:text=General%20Public%20License).-,AlphaZero,hours%20of%20self-play%20training.
Agnihotri, R. (2023, September 10). Rohit Agnihotri on linkedin. https://www.linkedin.com/posts/rohit-agnihotri_todays-best-practices-lead-to-dead-ends-activity-7106658456970878976-BsAr?utm_source=share&utm_medium=member_desktop

Bennett Plumb’s Substack

Discussion about this post

Ready for more?