Tuesday, 15 July 2014

BRIEFING: ThoughtWorks' QTB on Big Data

Some notes from this quarter's technology briefing from Thoughtworks. This session's topic was "Big Data". I was pleased that the topic was chosen as I am interest in Big Data and travel, especially how it can be used by my clients and to enhance the product that I work on.

Caitlin McDonald from twitter has also created a Storify story from tweets during the event (with the added bonus is that I am in the background to someone's photo)

Session

The main speaker was David Elliman with Ashok Subramanian - David has also written a blog post called The Big in Big Data Misses the Point that presents some of the content covered or see the full presentation in English or German. The session started y looking at the origin of "information explosion" and how in the 1940s people were starting to get worried about the miles of shelf space would be needed by 2000 to store all the books produced. This was contrasted with the explosion of multimedia information produced now, for example pictures from camera phones or the daily output from the large hadron collider.

Next basic architecture such as MapReduce was explained along with what makes "Big Data" - variety, volume and velocity - but also that it's what you have in your data, you don't always need a large data set. Sampling is important!

Another point was that the relationship is that velocity and volume is inversely proportional, low volume allows more real time analysis towards batch processing with very large volumes. Sorting data takes around 80% of the time - understanding data and sorting models is key, don't expect to get a data dump and have instant insight as there is no one answer or model, it requires data scientist to understand question and go through models.

Process described to get successful insight was:

  1. Start small
  2. Start with "?"
  3. Iteratively follow the value
Next some implementations of the Lamda architecture were described and tools discussed, finishing off with a talk about data lakes and their similarity to enterprise data warehousing - i.e. not very like big data. The quote of the day came in this session:

"Master data management is the enemy of innovation"
Overall a good session highlighting the importance of good analytics in getting insight from data. Having read their big data blog I was surprised by this as they prefer the term Big Data Analytics.

Venue

Shoreditch village hall is quite a nice venue, a little bit of a pain from Brighton and the room layout made mingling a bit harder. The sound was had a bit of feedback at the start but soon settled down. The food was great, a very tasty cheese burger (non-cheese and non-meat burgers were also available).

R for Product Management

Photo by  Štefan Štefančík  on  Unsplash Since my previous blog post I have made some progress on being able to replace most of what I...