IBM has long been involved with Apache Spark, an open source data analytics project, since its inception but has now upped the ante by releasing some of its own software and adding the technology to several of its own products.
IBM announced that they plan to embed Spark into its analytics and commerce platforms and offer Spark as a service on Bluemix. As part of the commitment, IBM is donating its IBM SystemML, a machine learning technology, to the Spark open source ecosystem.
Apache Spark began as a project at the University of California-Berkley in 2009, and IBM claims it is the fastest growing open source project in history. Federal agencies benefit greatly, as Spark should help them more quickly use and manage the massive amounts of data they produce; IBM is working with NASA and the SETI Institute to analyze terabytes of deep space radio signals using Spark’s machine learning capabilities.
IBM also pointed to the Agriculture Department as a beneficiary. The USDA collects data about farming, food inspections and economic data related to food production. They also have access to weather data and agricultural data from around the world. Spark can put all of those disparate forms and sources of data into a single data stream for analytics.