Thursday, April 7, 2016

Big Data and Security - the next big disruptor?


Last quarter I was invited to a Cloudera sales event in Las Vegas. Some impressive stats on last year's performance, a lot of enthusiasm and in particular a great session from Charles Zedlewski @zedlewski outlining some of the product and Apache initiatives coming soon.
Two in particular are now announced 
So far, so good, but these two announcement will make a huge impact in the IT Security market. for sometime now there has been little innovation in Security. the main players are all offering incremental enhancements to technology that has been around for years.
Big Data and the Hadoop eco-system can (and already has) disrupt the ITSec market. Principally it's a cost/scale dynamic. SIEM's, Vulnerability Management, Configuration Management tools and others are essentially about reacting to events that have already happened. they also use Metadata structured repositories to normalize, correlate and report. Look at any SIEM vendors details and you will see this common theme. Detect and fix something that has already happened. 
With Hadoop and it's various components and, in particular, the continuing path to maturity in machine learning products, this old style architecture is going to disappear. Sometime between now and 2020 the Enterprise Security Warehouse concept will be widely adopted. All data from all sources poured into a massive data lake (in real-time of course), with an HDFS/Kudu style repository for persistence and machine learning algorithms constantly monitoring what is happening and taking appropriate action as the threats happen  not after they happen. Gartner predicted this back in 2014 so it must be true..... http://www.gartner.com/newsroom/id/2778417 
In our discussions with clients we see a gradual realization, usually in the biggest clients first, that the old style Security Architectures have failed to keep up and new architectures built on big Data eco-systems and machine learning in particular, offer the greatest potential for the next disruptor. Look at how Splunk has built a $600m business on just this premise but without the machine learning part.
For an alternative view of ML and Security read Matt Harrigan's post@mattharrigan at Tech Crunch. http://techcrunch.com/2016/02/29/machine-learning-is-not-the-answer-to-better-network-security/ 
What do you think, is Machine learning already the big disruptor in Cyber Security?

Thursday, February 11, 2016

Machine Learning and Spark - get ready for the next big disruptor


There are lots of articles, blogs, reports and noise at the moment about Spark and machine learning - driven primarily by the rapid adoption of MLlib (Spark's general machine learning library) that is leading developers to use R and Python in particular for Advanced Analytics. For a great overview go to Infoworld - Why you should use Spark for Machine learning.
It's generally recognized that Spark has a long way to go before it is fully Enterprise ready. Almost every client I talk to follows a very familiar pattern - they want to try it for speed and scale, they try it and get disappointed in particular by it's scaleability and then decide to wait.
However, when Machine Learning comes into the discussion, Spark adoption is rapid, visible and highly successful. Customers are now recognizing the growing power of Spark/MLLib, particularly with the growing number of algorithms Spark MLLib supports.  ML has been around since 1979 and more recently the 'not very good' Mahout implementation has led to a lot of disappointed projects. 
We don't have space here to go into the details of ML but I notice four key trends that will help customers see strong and rapid time to value in their machine learning projects :-
  1. Customer 360 views are one of the most common Big Data use cases. Using ML and Spark MLLib in particular, customers can leverage massive data volumes to make product recommendations to customers in real time using ads or other recommendation platforms. ML can take Recommendation and Monetization engines to whole new level of predictability and relevance in real-time
  2. Similarly in Mobile Networks, ML can be used to predict and manage Network Optimization - a critical cost element in Mobile Network profitability. Think about it like a river. Use ML to maximize the flow of water through the narrowest channels while maintaining speed and volume. Maximum benefit flows from predicting in near real time how the flows (Wireless traffic) should be managed. 
  3. With Geolocation services, massive data volumes and ML, Retailers can tailor specific offers to individuals. Imagine a scenario where you go into a Nordstrom's type store, the Store ML system picks up (from the Store's already installed  Mobile App) that you have entered the store. As you wander round the various departments the ML system is rapidly choosing products you will be interested in (and presenting them on your mobile device) and, when you press the 'Get Help' button on your phone, the Sales assistant glides over, already armed with all your previous purchase history and set of suggestions on what to buy. They open the conversation with 'Good Morning Mr. Bennett, let's take a look at that Emile Staub Cocotte that you looked at last time you were here'.....
  4. Data Wrangling is still a big issue, Machine learning based companies like Trifacta are starting to get a lot of traction inside the Enterprise. Once large companies understand how ML apps can change their entire Big Data ecosystem, ML will become a mainstream technology during 2016.
Want to know more about Machine learning - take a look at this Infoworld slideshare
What do you think? Is Machine Learning the next big disruptor?

Tuesday, January 19, 2016

Would you like a Monopoly with your Data or just a cartel?


Two weeks ago I was fortunate to attend the CES show in Las Vegas. I experienced the joys of 3x Uber surge pricing on Thursday afternoon as well as some great discussions. Two things struck me from CES :-
  1. Try as they might, no-one can make the Internet of things very interesting. I visited a home automation demo in the Qualcomm booth. the demo was good and you can see some of the content here in this promo from Qualcomm. But the problem is - the technology is not new, not very innovative and has been around quite a while. More interesting was the La Poste booth where there was a platform for IoT traffic (Hub Numerique) that is being used by 20 innovative startups (from smart shoes, to intelligent drinking glasses). Hub Numerique (French). Using the platform idea, La Poste takes away the complexity and frees start ups to innovate.
  2. Even though it is not very interesting, IoT is really going to take off when bandwidth makes the next step. 4G mobile networks are not powerful enough to handle the vast amounts of data that IoT will generate. 5G - which is more of an idea than an emerging standard  - will provide mobile networks that are 100 times fast than 4G. (Download an HD movie in 1 second). So my thought is that IoT is coming but maybe not quite as fast as the vendors would like it to. Tech Republic have a great post on 5G and it's background including the usual issue about spectrum for new wireless networks.
However, it will happen, and, when it does, there will be an enormous explosion in data volumes. Already we create 5 Exabytes every two days, by 2019 it will be 5 Exabytes every day (including 1 Exabyte in mobile platforms alone). by way of comparison - From the beginning of civilization to 2003 the human race created..... 5 Exabytes. Now we create the same volume of data in 48 hours. And this growth is happening without massive IoT adoption. Cisco's annual forecast is informative for those of us whose like lots of numbers. This year's will be out in February and it will be interesting to see what the growth will be to 2020.
So all this data and who will control it? This is a key question for markets, customers, solution providers and the whole Big Data ecosystem. In the European Union for example, the online search market is completely dominated by Google with a 90% share. Which has given birth to a new term 'Data dominance'. just like the old monopolies and cartels of Oil, Railways, Steel making and so on shaped the 19th and 20th Century economies so - the theory goes - Data dominance is the key metric for commercial success in the 21st Century. Then it becomes clear that even if you have a free market with limited regulation and access to as much information as you want - whoever controls that information and data controls everything.  So Big Data becomes a platform for centralization and consolidation of market power.
What do you think -  is Data Dominance important? Do you think we should be concerned? For a (Vodafone sponsored) survey on this question and some interesting insights into European perspectives on Big data take a look at this recently published report or the summary at Forbes.com