Sunteți pe pagina 1din 17

Sizing up Big Data?

Hitting the Vs

Clive Longbottom,

Service Director, Quocirca Ltd

Big Data
Its not about databases per se It is about: Volume but not just databases Velocity results need to be produced in near real-time Variety the aspect that is missed by many Veracity how good are the inputs Value is the data worth it?

Quocirca 2012

A basic rule of thumb


20 years ago: Only 20% of an organisations information was in electronic form 80% of this was in a formal database Today: Well over 80% of an organisations information is in electronic form Less than 20% is in a formal database

Quocirca 2012

The enterprise application dilemma


CRM ERP SCM

Inf. Silo

Inf. Silo

Inf. Silo

Quocirca 2012

The growth of unstructured

Source: Ram Subramanyam Gopalan

Not just text but images, video media assets, VoIP, Videoconferencing Replicated/archived data a large part of growth But is it completely unstructured?
Quocirca 2012

File formatting
XML (or quasi-XML) CSV/tab delimited Text blocks Meta data TCP/IP packet header information Pattern recognition Colour, shape, texture (CST) Inferred data

Quocirca 2012

The open value chain


Open information from e.g. search engines, social networks

Information flows

Customers Customer customer

Your Organisation

Supplier

Suppliers supplier

Quocirca 2012

Organisation information sources

Organisation data: Enterprise application data Office documents Reports, analytics GRC information Information on competitors Financial performance data Images, voice, video
Quocirca 2012

Supplier information sources

Supplier data Logistics data Inventory data Transactional data Competitive information Credit and background checks Invoices, catalogues, contracts, images Voice, video

Quocirca 2012

Customer information sources

Customer data: Orders, payment details, returns information Past purchases Credit and background checks Searches, web analytics Social media comments

Quocirca 2012

Information issues
You no longer have control The open value chain removes direct control Security of information assets is critical Identifying and aggregating information assets Capturing information when and where possible and legal Bringing structured and unstructured together Sifting through the dross to get to the golden nuggets
Quocirca 2012

Shrink and filter


Information under your control: Deduplicate Taxonomise Index Tag Information not under your control: Filter (intelligently) Tag and index when it crosses your boundaries

Quocirca 2012

Federate and aggregate


Link databases Use master data management Bring in unstructured data Use Hadoop along with NoSQL datastores (e.g. Cassandra, MongoDB) Use cross-function search and reporting tools E.g. HP Autonomy, CommVault Simpana Use analytics to present results in meaningful ways

Quocirca 2012

Basic schematic approach


Filter

Apply metadata

App

MapReduce

SQL Search, analyse and report


Quocirca 2012

NoSQL

A future glimpse?
Its dj vu all over again Remember in-memory databases? Big data cannot remain as a jigsaw solution Full-service solutions will come forward Who will be the winners? Oracle, IBM, Microsoft? SAP? EMC, Symantec? The Open Source environment (e.g. 10Gen, Apache/Cassandra, CouchDB)?

Quocirca 2012

Conclusions
Big Data has many vectors Volume, velocity, variety and veracity: each is as important as the others - value will accrue through getting them right More information is outside the realm of your direct control Capturing what can be captured in a useful manner is key The evolution of the market is rapid NoSQL and Hadoop provide the underpinnings for a new, information centric approach The formal database is not dead But it is only on aspect of the problem and the solution
Quocirca 2012

Thank you
Contact details: Clive.Longbottom@Quocirca.com

Further reading: http://quocirca.com/reports/150 http://quocirca.com/articles/617 http://quocirca.com/articles/637

Quocirca 2012

S-ar putea să vă placă și