Hadoop news


Top Stories

What changes in the cloud computing and big data landscape should we be expecting in 2013? In this article we offer a round-up of industry experts' opinions as they were asked by Cloud Expo / BigDataExpo Conference Chair Jeremy Geelan to preview the fast-approaching year ahead. 2013 Will Be The Year of Big Data  | The Internet of Things | Cloud To The Rescue (DR) | SSD John Engates | @jengates CTO of Rackspace Hosting Now its CTO, John joined Rackspace in August 2000, just a year after the company was founded, as VP of Operations, managing the datacenter operations and customer-service teams. Two years later, when Rackspace decided to add new services for larger enterprise customers, he created and helped develop the Intensive Hosting business unit. Most recently, he has played an active role in the evolution and evangelism of Rackspace's cloud computing strategy an... (more)

Oracle Fills Another Gap in Its Big Data Offering

When we last left Oracle’s big data plans, there was definitely a missing piece. Oracle’s Big Data Appliance as initially disclosed at last fall’s OpenWorld was a vague plan that appeared to be positioned primarily as an appliance that would accompany and feed data to Exadata. Oracle did specify some utilities, such as an enterprise version of the open source R statistical processing program that was designed for multithreaded execution, plus a distribution of a NoSQL database based on Oracle’s BerkeleyDB as an alternative to Apache Hive. But the emphasis appeared to be extraction and transformation of data for Exadata via Oracle’s own utilities that were optimized for its platform. With Oracle’s announcement of general availability of the big data appliance, it is filling in the blanks. As such, Oracle’s plan for Hadoop was competition, not for Cloudera (or Horto... (more)

Is Big Data a Solution in Search of a Problem?

If you look at the predictions made for 2012, you will find a new entry which was not there last year. Be it Gartner, Forrester or McKenzie – “Big Data” finds a place in the prediction. So, what is big data? Is it the next path breaking technology which will change everything or is it just a hype which will die down after sometime? Let us take a realistic look at what the term big data mean and what problem it can solve. What is "Big Data"? (The Wikipedia page on Big Data is not that good. The clearest explanation I have found is from O’Reilly Radar – here is the link) Here is a short explanation. Big Data is the name given to the classes of technologies that needs to be used when your data volume becomes so much that the RDBMS technologies can no longer handle it. Big data spans three dimensions (taken from this article of IBM): Variety – Big data extends beyond st... (more)

Big Data Predictions for 2013

My prediction for 2013 is that competitive advantage will translate into enterprises using sophisticated Big Data analytics to create a new breed of applications - Intelligent Applications. "It's more than just insights from MapReduce", a CIO from a fortune 100 told me, "It's about using data to make our customer touch points more engaging, more interactive, more intelligent." So when you hear about "Big Data solutions," you need to translate that into a new category of "Intelligent Applications." At the end of the day, it's not about people pouring through petabytes of data. It's actually about how one turns the data into revenue (or profits). This means that you MUST: Start with the business problem first (preferably one with revenue upside versus cost savings) Determine which data elements you can leverage AFTER #1 Define an analytical three-tier architecture (a... (more)

Hadoop and Big Data Easily Understood - How to Conduct a Census of a City

BigData (and Hadoop) are buzzword and growth areas of computing; this article will distill the concepts into easy-to-understand terms. As the name implies, BigData is literally "big data" or "lots of data" that needs to be processed. Lets take a simple example: the city council of San Francisco is required to take a census of its population - literally how many people live at each address. There are city employees who are employed to count the residents. The city of Los Angeles has a similar requirement. Consider are two methods to accomplish this task: 1. Request all the San Francisco residents to line up at City Hall and be prcessed by the city employees. Of course, this is very cumbersome and time consuming because the people are brought to the city hall and processed one by one - in scientific terms the data are transfered to the processing node. The people have ... (more)

Beyond Big Data and @Benioff’s “AI Spring” to the Dawn of Dataware By @MattMcIlwain | @CloudExpo [#BigData]

This article was authored by Matt McIlwain and was originally published on Medium. For more of Matt's writing, you can follow him here! Guest Post: Beyond Big Data and Benioff’s “AI Spring” to the Dawn of Dataware Big Data, AI, Machine Learning, Hadoop, Predictive Analytics — we hear these terms every day from companies such as Cloudera, Trifacta and Dato (formerly GraphLab) that are securing many millions in financing. I believe that 2015 will be the year when the conversation moves from Big Data to the Dataware stack. Over the past twelve months we have seen a lot of companies across the big data spectrum emerge and while the language can be the same, there are clear product categories that have emerged which describe the market opportunity and future growth. This is the Dataware stack. Dataware is the combination of infrastructure, data intelligence systems tha... (more)

Red Hat Unveils Big Data and Open Hybrid Cloud Direction

Red Hat on Wednesday announced its Big Data direction and solutions to satisfy enterprise requirements for highly reliable, scalable, and manageable solutions to effectively run their Big Data analytics workloads. In addition, Red Hat announced that the company will contribute its Red Hat Storage Hadoop plug-in to the Apache Hadoop open community to transform Red Hat Storage into a fully supported, Hadoop-compatible file system for Big Data environments, and that Red Hat is building a robust network of ecosystem and enterprise integration partners to deliver comprehensive Big Data solutions to enterprise customers. Red Hat Big Data infrastructure and application platforms are suited for enterprises leveraging the open hybrid cloud environment. Red Hat is working with the open cloud community to support Big Data customers. Many enterprises worldwide use public cloud... (more)

Big Data and Master Data Management

Master Data Management (MDM) is a very important data governance aspect in enterprises whereby MDM enables the development of a "Single Version of Truth." MDM establishes Single Version of Truth by providing common descriptions for enterprise-wide entities. Need for MDM in Big Data Processing Before Big Data, enterprises generally managed their transaction data in traditional relational databases. One of the biggest strengths of relational databases is their ability to enforce constraints like check constraints, primary key, foreign key, etc., which ensure that the data captured is of the highest quality. In spite of such support for data integrity, enterprises had duplicates in their master data that resulted in inaccurate results in analytics on that data. For example, an enterprise may target an expensive advertisement campaign for a new product to its existing c... (more)

Big Data Top Ten | @CloudExpo [#BigData]

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig? No, you get a “Logical Data Warehouse”. My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.” In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one. As this is happening, I predi... (more)

Compuware APM Extends Leadership in Big Data

Compuware Corporation on Wednesday announced that Compuware APM for Big Data now offers enhanced support and out-of-the-box dashboards that enable organizations to optimize big data projects through unmatched visibility into Hadoop, NoSQL and Cassandra deployments. Now organizations have deeper insight into big data workloads and transactions to quickly find the root cause of slow jobs and failures in minutes, instead of hours or days. Enhancements for Hadoop enable operations teams to gain deep insight into the most active users in a cluster with automatic profiling of intensive jobs. Problem patterns, including data shuffle across the network, can be quickly identified as well as resource utilization and tracking to enable charge-back models. New enhancements in Compuware APM for Big Data include: Enhanced out-of-the-box, zero configuration dashboards for Hadoop, ... (more)

In 2014 Big Data Investments Will Account for Nearly $30 Billion - Eventually Accounting for $76 Billion by 2020 End

DALLAS, Aug. 21, 2014 /PRNewswire-iReach/ -- Amid the proliferation of real time data from sources such as mobile devices, web, social media, sensors, log files and transactional applications, Big Data has found a host of vertical market applications, ranging from fraud detection to R&D. Photo - http://photos.prnewswire.com/prnh/20140821/138541 "Big Data Market: 2014 – 2020 – Opportunities, Challenges, Strategies, Industry Verticals & Forecasts" Key Findings: In 2014 Big Data vendors will pocket nearly $30 Billion from hardware, software and professional services revenues Big Data investments are further expected to grow at a CAGR of nearly 17% over the next 6 years, eventually accounting for $76 Billion by the end of 2020 The market is ripe for acquisitions of pure-play Big Data startups, as competition heats up between IT incumbents Nearly every large scale IT ven... (more)