Big Data is defined, by the Gartner Group, as information that includes the following four dimensions: velocity, variety, volume and veracity. Those four are what make Big Data unique or different than normal data.
Velocity refers to how long a company has to be able to process the information. The analysis needs usually needs to be completed in a short amount of time. Today's customer or order data needs to be analyzed before tomorrow.
Volume refers to the amount of data. Big Data works with terabytes and petabytes of data on market information, customer sentiment across social media channels, input from other enterprise systems (like shipping, credit card processing, etc.).
Variety refers to the fact that Big Data deals with tables in databases as well as unstructured data like video, audio, text, or other random sensor data. This isn’t just nice clean SQL tables. It’s messy data and metadata.
Veracity refers to the speed of the data flow. With Big Data, the signals, messages, and information come in too quickly to be processed by old technologies. As the number of sensors in the world increases, this becomes more important.
The term “Big Data” encompasses, not only these 4 dimensions, but also the new tech paradigms and algorithms for processing this data. MapReduce and Hadoop (built off MapReduce) are two important foundations of the Big Data revolution. MapReduce is a model that allows tasks to be performed across many computers, at the same time. Hadoop is a way of networking hardware together in a way that we can query across servers and databases faster.
Learn Spark internals for working with NoSQL databases as well debugging and troubleshooting.
Introduction to Apache Spark
Learn how to use Apache Spark as an alternative to traditional MapReduce processing.
Introduction to Apache Spark in Production
Learn the architecture and internals of Spark, a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Applying Big Data Technologies
Learn to use big data technologies and understand their tradeoffs.
Big Data Bootcamp
Learn all about Hadoop and Big Data technologies.
Introduction to Apache Flink
Learn scalable batch and stream data processing using Apache Flink.
Introduction to Apache Kafka
Learn how to use the high-throughput, distributed, publish-subscribe messaging system Apache Kafka.
Introduction to Apache Storm
Learn Storm, which is the real time processing framework for Hadoop.
Introduction to Apache Zeppelin
Learn the interactive data analytics UI framework, Apache Zeppelin, that allows ease of access to data coming from several big data implementations.
Introduction to Apache ZooKeeper
Learn the internals of Zookeeper and explores how it functions.
Introduction to Cassandra
Learn the ins and outs of Cassandra required to build a Cassandra-based application.
Introduction to Gemfire
Learn to use Gemfire in high performance systems in order to facilitate fast access to data.
Introduction to Redis
Learn to use the NoSQL database Redis.
Engineering Reactive Architecture Using Scala, Akka, Play
Teach Reactive Programming with Scala as a foundation.
Fast Track to Akka with Scala
Learn how to build web applications with Scala and Akka.
Fast Track to Play with Scala
Learn how to build web applications with Scala and Play.
Introduction to Akka Framework
Learn how to quickly build web applications in Scala using the Akka framework.
Introduction to Akka with Java
Learn how to use the Akka Framework with Java to build distributed applications.
Introduction to Play Framework
Learn how to quickly build web applications in Scala using the Play framework.
Introduction to Scala
Learn how to adopt Scala to efficiently build multi-core processing applications.
Introduction to Scala Learning Spike
Learn the fundamentals of the Scala programming language.
Test Driven Development with Scala
Learn how to effectively test Scala based applications.
Learn to maintain/operate a Hadoop cluster.
Learn the fundamentals of the Hadoop platform.
Hadoop for Data Analysts
Learn how to use Hadoop to manage, manipulate, and query large complex data in real time.
Introduction to Administering Hadoop Clusters
Learn how to set, configure, and administer Hadoop.
Introduction to Hadoop Administration
Learn how to administer and maintain Hadoop.
Introduction to Hadoop for Developers
Learn how to write MapReduce programs using Java.
Introduction to Hadoop for Managers
Understand how Hadoop fits into your infrastructure.