Hadoop
Introduction
Hadoop is an open-source software platform for distributed storage and distributed processing for big data. It consists of several components that work together to load, process, and store big data written in Java.
Hadoop Architecture
The Hadoop architecture consists mainly of four components:
- MapReduce
- Hadoop Distributed File System (HDFS)
- Yet Another Resource Negotiator (YARN)
- Common Utilities (or Hadoop Common)
MapReduce
Note: For more information, read about MapReduce here.
MapReduce
is a core component of the Hadoop platform.
The main funtionality of MapReduce is to split large amounts of data into
smaller chunks and distribute the smaller chunks onto multiple servers for
massively parallel processing tasks.
The architecture allows splitting workloads into a massive number of smaller ones that
later get re-combined into singular data sets.
Hadoop Distributed File System (HDFS)
Note: For more information, read about Hadoop Distributed File System (HDFS)
The Hadoop Distributed File System is utilized for storage permission in a Hadoop cluster. It's designed for providing a commodity hardware scalable and highly available storage cluster for distributing processing and querying workloads.
Yet Another Resource Negotiator (YARN)
Note: For more information, read about Hadoop: Yet Another Resource Negotiator (YARN)
YARN is a framework that MapReduce works with. YARN performs two operations job scheduling and resource management.
Deployment
TODO: Write more about how to deploy via containers, kubernetes & ansible.
Thanks to the EU's Horizon 2020 project, Big Data Europe, offers open source resources to deploy Big Data tools such as Hadoop. This will help when deploying Hadoop on a cluster of machines or a single development one. This includes their Hadoop Docker Container which can be used to deploy Hadoop on a single machine.
References
Web Links
- Wikipedia Contributors. "Apache Hadoop". 2023. wikipedia.org.
- Big Data Europe (from Github by Big Data Europe Project)
- Big Data Europe Hadoop Docker Container(from Github by Big Data Europe Project)