GTechEd - powered by GTechnosoft (Geetika Technosoft Pvt Ltd) : November 2014

Monday, November 17, 2014

A Quick Overview - Apache Spark

Apache Spark

Apache Spark is an open source parallel processing framework that enables users to run large-scale data analytics applications across clustered computers.

Apache Spark can process data from a variety of data repositories, including the Hadoop Distributed File System (HDFS), NoSQL databases and relational data stores such as Apache Hive. Spark supports in-memory processing to boost the performance of big data analytics applications, but it can also do conventional disk-based processing when data sets are too large to fit into the available system memory.

Spark became a top-level project of the Apache Software Foundation in February 2014, and Version 1.0 of Apache Spark was released in May 2014. The technology was initially designed in 2009 by researchers at the University of California, Berkeley, as a way to speed up processing jobs in Hadoop systems. Spark provides programmers with a potentially faster and more flexible alternative to MapReduce, the software framework that early versions of Hadoop were tied to. Spark's developers say it can run jobs 100 times faster than MapReduce when processed in memory and 10 times faster on disk.

In addition, Spark can handle more than the batch processing applications that MapReduce is limited to running. The core Spark engine functions partly as an application programming interface (API) layer and underpins a set of related tools for managing and analyzing data, including a SQL query engine, a library of machine learning algorithms, a graph processing system and streaming data processing software.

Apache Spark can run in Hadoop 2 clusters on top of the YARN resource manager; it can also be deployed standalone or in the cloud on the Amazon Elastic Compute Cloud (EC2) service. Its speed, combined with its ability to tie together multiple types of databases and run different kinds of analytics applications, has prompted some proponents to claim that Spark has the potential to become a unifying technology for big data applications.

What is GTechnosoft

GEETIKA TECHNOSOFT PVT LTD(GTECHNOSOFT) is an IT consulting and services company providing Technology and Functional consultants to our clients and executing projects with Quality and Customer Satisfaction. We also provide Cloud or SaaS(Software as a Service) based services to our Clients in a subscription model.

Internally, we are a team, bringing in vast experience from past companies, united to provide unparalleled value to our clients. "Think as an Enterprise, work as a Startup" - Because we believe in Quality only. Our consultants and executives are not only globally certified Tech Gurus but also are excellent professionals. We never force you for technology. We first analyze your pain area by understanding your infrastructure and, then only provide you the most cost effective and quality solution - Which is Best in Class.

We currently provide services to a wide spectrum of clients all over India ranging from startups to Fortune 500 companies.