BRIEFING: ThoughtWorks' QTB on Big Data
Some notes from this quarter's technology briefing from Thoughtworks. This session's topic was "Big Data". I was pleased that the topic was chosen as I am interest in Big Data and travel, especially how it can be used by my clients and to enhance the product that I work on.
Caitlin McDonald from twitter has also created a Storify story from tweets during the event (with the added bonus is that I am in the background to someone's photo)
Next basic architecture such as MapReduce was explained along with what makes "Big Data" - variety, volume and velocity - but also that it's what you have in your data, you don't always need a large data set. Sampling is important!
Another point was that the relationship is that velocity and volume is inversely proportional, low volume allows more real time analysis towards batch processing with very large volumes. Sorting data takes around 80% of the time - understanding data and sorting models is key, don't expect to get a data dump and have instant insight as there is no one answer or model, it requires data scientist to understand question and go through models.
Process described to get successful insight was:
Caitlin McDonald from twitter has also created a Storify story from tweets during the event (with the added bonus is that I am in the background to someone's photo)
Session
The main speaker was David Elliman with Ashok Subramanian - David has also written a blog post called The Big in Big Data Misses the Point that presents some of the content covered or see the full presentation in English or German. The session started y looking at the origin of "information explosion" and how in the 1940s people were starting to get worried about the miles of shelf space would be needed by 2000 to store all the books produced. This was contrasted with the explosion of multimedia information produced now, for example pictures from camera phones or the daily output from the large hadron collider.Next basic architecture such as MapReduce was explained along with what makes "Big Data" - variety, volume and velocity - but also that it's what you have in your data, you don't always need a large data set. Sampling is important!
Another point was that the relationship is that velocity and volume is inversely proportional, low volume allows more real time analysis towards batch processing with very large volumes. Sorting data takes around 80% of the time - understanding data and sorting models is key, don't expect to get a data dump and have instant insight as there is no one answer or model, it requires data scientist to understand question and go through models.
Process described to get successful insight was:
- Start small
- Start with "?"
- Iteratively follow the value
"Master data management is the enemy of innovation"Overall a good session highlighting the importance of good analytics in getting insight from data. Having read their big data blog I was surprised by this as they prefer the term Big Data Analytics.
Comments
Post a Comment