Hadoop news

Hadoop


Top Stories

DALLAS, Aug. 21, 2014 /PRNewswire-iReach/ -- Amid the proliferation of real time data from sources such as mobile devices, web, social media, sensors, log files and transactional applications, Big Data has found a host of vertical market applications, ranging from fraud detection to R&D. Photo - http://photos.prnewswire.com/prnh/20140821/138541 "Big Data Market: 2014 – 2020 – Opportunities, Challenges, Strategies, Industry Verticals & Forecasts" Key Findings: In 2014 Big Data vendors will pocket nearly $30 Billion from hardware, software and professional services revenues Big Data investments are further expected to grow at a CAGR of nearly 17% over the next 6 years, eventually accounting for $76 Billion by the end of 2020 The market is ripe for acquisitions of pure-play Big Data startups, as competition heats up between IT incumbents Nearly every large scale IT ven... (more)

Big Data Top Ten | @CloudExpo [#BigData]

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig? No, you get a “Logical Data Warehouse”. My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.” In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one. As this is happening, I predi... (more)

How Enterprise Big Data Will Affect Organizations in 2012

As the new year begins, global companies face the coming year's most prominent IT and business challenge: Big Data. The focus for IT will be to provide high performance analytics capabilities at the lowest cost, as business users need to tap into volumes of multi-structured data about their customers and markets to gain competitive advantage. RainStor, a provider of Big Data management software, has released five predictions focused on how enterprise Big Data will affect organizations in 2012. Based on client and partner experience, market research and conversations with industry experts, here are RainStor's five predictions for Big Data in 2012: Prediction #1: Big Data will Transition from Technology "Buzz" to a Real Business Challenge Affecting Many Large Global Enterprises Big Data is largely centered on leveraging the open source Apache Hadoop analytics platform... (more)

The Big Data Revolution

For many years, companies collected data from various sources that often found its way into relational databases like Oracle and MySQL. However, the rise of the Internet, Web 2.0, and recently social media began an enormous increase in the amount of data created as well as in the type of data. No longer was data relegated to types that easily fit into standard data fields. Instead, it now came in the form of photos, geographic information, chats, Twitter feeds, and emails. The age of Big Data is upon us. Big Data Beginnings A study by IDC titled "The Digital Universe Decade" projects a 45-fold increase in annual data by 2020. In 2010, the amount of digital information was 1.2 zettabytes (1 zettabyte equals 1 trillion gigabytes). To put that in perspective, the equivalent of 1.2 zettabytes is a full-length episode of "24" running continuously for 125 million years, ac... (more)

Internet of Things, Fast Data vs. Big Data

Back when we were doing DB2 at IBM, there was an important older product called IMS which brought significant revenue. With another database product coming (based on relational technology), IBM did not want any cannibalization of the existing revenue stream. Hence we coined the phrase “dual database strategy” to justify the need for both DBMS products. In a similar vain, several vendors are concocting all kinds of terms and strategies to justify newer products under the banner of Big Data. One such phrase is Fast Data. We all know the 3Vs associated with the term Big Data – volume, velocity and variety. It is the middle V (velocity) that says data is not static, but is changing fast, like stock market data, satellite feeds, even sensor data coming from smart meters or an aircraft engine. The question always has been how to deal with such type of changing data (as ... (more)

Big Data Is All the Rage. Why?

On Monday, December 5, Bob Gourley went on the Enterprise CIO Forum to explain Big Data and why it matters. First, he defined Big Data simply as the data your organization cannot currently analyze. Though some technologists give more precise definitions, this sums up the challenge enterprises now face. If you can deal with all of your data now, you don’t have a Big Data problem, but as soon as you have more data than you can effectively manage to finding the answers you need fast enough to use them, you need a Big Data solution. Structured data and relational databases can also be Big Data but what we’re really talking about is the type and volume of information that exceeds traditional methods. New solutions include MapReduce, originally developed at Google to analyze and index the entire Internet, and Hadoop which grew to use those new methods. We see Big Data so... (more)

Examining the True Cost of Big Data

The good news about the Big Data market is that we generally all agree on the definition of Big Data, which has come to be known as data that has volume, velocity and variety where businesses need to collect, store, manage and analyze in order to derive business value or otherwise known as the "4 V's." However, the problem with such a broad definition is that it can mean different things to different people once you start to put some real values next to those V's. Let's be honest, Volume can be a different thing to different organizations. To some it is anything above 10 terabytes of managed data in their BI environment and to others it is petabyte scale and nothing less. Likewise velocity can be multi-billions of daily records coming into the enterprise from various external and internal networks. When it really comes down to it, each business situation will be qu... (more)

Lessons Learned from Real-World Big Data Implementations

In the past few weeks I visited several Cloud and Big Data conferences that provided me with a lot of insight. Some people only consider the technology side of Big Data technologies like Hadoop or Cassandra. The real driver however is a different one. Business analysts have discovered Big Data technologies as a way to leverage tons of existing data and ask questions about customer behavior and all sorts relationships to drive business strategy. By doing that they are pushing their IT departments to run ever bigger Hadoop environments and ever faster real-time systems. What's interesting from a technical side is that ad-hoc analytics on existing data is allowed to take some time. However ad-hoc implies people waiting for an answer, meaning we are talking about minutes and not hours. Another interesting insight is that Hadoop environments are never static or standalo... (more)

Cloud Computing and Big Data in 2013: What's Coming Next?

What changes in the cloud computing and big data landscape should we be expecting in 2013? In this article we offer a round-up of industry experts' opinions as they were asked by Cloud Expo / BigDataExpo Conference Chair Jeremy Geelan to preview the fast-approaching year ahead. 2013 Will Be The Year of Big Data  | The Internet of Things | Cloud To The Rescue (DR) | SSD John Engates | @jengates CTO of Rackspace Hosting Now its CTO, John joined Rackspace in August 2000, just a year after the company was founded, as VP of Operations, managing the datacenter operations and customer-service teams. Two years later, when Rackspace decided to add new services for larger enterprise customers, he created and helped develop the Intensive Hosting business unit. Most recently, he has played an active role in the evolution and evangelism of Rackspace's cloud computing strategy an... (more)

Big Data: New Ways to Hadoop with R

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streaming data direct from HDFS to ScaleR's big-data predictive algorithms. And now, there are even more Hadoop platforms supported for use with Revolution R Enterprise. You can use: Cloudera CDH3 or CDH4 IBM BigInsights 2 New! Hortonworks Data Platform 1.2 New! Intel's Distribution for Hadoop (announced today) And by the end of the year, there will be a third way to use Hadoop with R: 3. Leave the data in Hadoop, and use ScaleR's "in-Hadoop predictive analytics" We announced today that we are jointly developing in-Hadoop predictive analy... (more)

Hadoop and Big Data Easily Understood - How to Conduct a Census of a City

BigData (and Hadoop) are buzzword and growth areas of computing; this article will distill the concepts into easy-to-understand terms. As the name implies, BigData is literally "big data" or "lots of data" that needs to be processed. Lets take a simple example: the city council of San Francisco is required to take a census of its population - literally how many people live at each address. There are city employees who are employed to count the residents. The city of Los Angeles has a similar requirement. Consider are two methods to accomplish this task: 1. Request all the San Francisco residents to line up at City Hall and be prcessed by the city employees. Of course, this is very cumbersome and time consuming because the people are brought to the city hall and processed one by one - in scientific terms the data are transfered to the processing node. The people have ... (more)