Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include: -High potential for disruption to IT or the biz -Need for a major dollar investment -Risk being late to adopt
Your new best friend the Data Scientist.
Humans have been whining about being bombarded with too much information since the advent of clay tablets. The complaint in Ecclesiastes that “of making many books where there is no end” resonated in the Renaissance when the invention of the printing press flooded Western Europe with what an alarmed Erasmus called, “swarms of new books.” US Census of 1880 50 million people. On Average the census had taken the U.S. eight years to complete.
US Census of 1890 Used the Hollerith Tabulating System that utilized punched cards with 80 variables on them. Takes the U.S. one year to complete the census instead of eight years.
In 1935, President FDR’s Social Security Act launches the U.S. gov on its most ambitious data gathering project ever, as IBM wins a gov. contract to keep employment records on 26 million working Americans and 3 million employers.
In 1943, At Bletchley Park, a British facility in WW II dedicated to breaking the Nazi codes, engineers develop a series of groundbreaking mass data-processing machines, culminating in the first programmable electronic computer. The device – Colossus – can read paper tape at 5,000 characters a second. Reducing decoding from weeks to a few hours.
In 1961, NSA (a nine-year old intelligence agency) has hired 12,000 cryptologists confronts information overload during the Cold War as it begins collecting and processing signals automatically with computers while struggling to digitize its backlog of records. In July of 1961, NSA receives 17,000 reels of tape.
In 1989 British computer scientist Tim Berners-Lee proposes leveraging the Internet to share information globally through a hyper text system called the World Wide Web. He is quoted as saying, “The information contained would grow past a critical threshold, so that the usefulness of the scheme would in turn encourage its increased use.”
In 1997, NASA researchers – Michael Cox and David Ellsworth – use the term “big data” for the first time to describe a familiar challenge in the 1990’s supercomputers generating massive amounts of information.
Retailers are beginning to amass information on customers shopping and personal habits. At the end of 2004, Wal Mart boasts a cache of 460 terabytes – more than double the amount of data on the Internet at the time.
In 2009, India establishes the Unique Identification Authority of India – to fingerprint, photograph and take an iris scan of all 1.2 billion people in the country and assign each person a 12-digit ID number, funneling all the data into the world’s largest database.
Scanning 200 million pages of information or 4 terabytes of disk storage in a matter of seconds, IBM’s Watson computer system defeats two human challengers in the quiz show Jeopardy. The New York Times later dubs this a moment a “triumph of Big Data computing.”
In March of 2012 the Obama administration announces a $200 million Big Data Research and Development Initiative calling for every federal agency to have a “big data strategy.”
Akami analyzes 75 million events per day to better target advertisements.
Processes and mines petabytes of user data to power “people you may know.”
Processed over 4TB worth of raw images into 11 million finished PDF’s in 24 hours.
Decoding the human genome used to take ten years and now it can be done in seven days.
It’s systems process 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day. Has over a 100 petabytes of data stored in a single hadoop disk cluster. Company believes it has the single larges hadoop cluster in the world.
Companies are moving away rapidly from batch processing to real-time to gain a competitive advantage.
The more you copy and move your data, the less reliable it becomes.
More diverse data leads to greater insights. Combining multiple data sources can lead to the most interesting insights of all.
Don’t throw your data away.
The number of photos, emails, and IM’s while large is limited by the number of people. Networked “sensors” data from mobile phones, GPS and other devices is much larger.
Don’t think of big data as a stand alone new shiny technology. Think about your core business problems and how to solve them by analyzing big data.
More data alone isn’t sufficient. Look for ways to broaden the use of data across your organization.
Those that fail to leverage the numerous internal and external data sources available will be leapfrogged by new entrants.
Transcript of ""Big Data Dreams""
“BIG DATA DREAMS”Michael C. DeAloiaRegional Vice President - Cleveland
“ROADMAP”“Big Data Dreams”5What is Big Data?History of Big DataBig Data by the Numbers8 Laws of Big DataQ&A
WHATISBIGDATA?“Big Data Dreams”6“Gartner has defined ‘BigData’ as a StrategicTechnology for 2013”
WHATISBIGDATA?“Big Data Dreams”7Big Data /big ‘date/- Is a collection of data sets so largeand complex that it becomes difficult to process usingon-hand database management tools or traditional dataprocessing applications.Challenges of big data include capture, curation, storage,search, sharing, transfer, analysis and visualization.
“WHATISBIGDATA?”“Big Data Dreams”8The three Vs characterize what big data is all about, and also helpdefine the major issues that IT needs to address:•Volume. The massive scale and growth of unstructured data outstripstraditional storage and analytical solutions.•Variety. Traditional data management processes can’t cope with theheterogeneity of big data—or “shadow” or “dark data,” such as accesstraces and Web search histories.•Velocity. Data is generated in real time, with demands for usableinformation to be served up immediately.
WHATISBIGDATA?“Big Data Dreams”9“Big Data is the new oil.”-Bryan Trogdon, “Big Data” Pew ResearchReport
“WHATISBIGDATA?(ANDWHATITISN’T)”Big Data Analytics is…A technology-enabled strategy for gaining richer,deeper insights into customers, partners, and thebusiness—and ultimately gaining competitiveadvantage.Working with data sets whose size and variety isbeyond the ability of typical database software tocapture, store, manage, and analyze.Processing a steady stream of real-time data inorder to make time-sensitive decisions faster thanever before.Distributed in nature. Analytics processing goes towhere the data is for greater speed and efficiency.A new paradigm in which IT collaborates withbusiness users and “data scientists” to identify andimplement analytics that will increase operationalefficiency and solve new business problems.Moving decision making down in the organizationand empowering people to make better, fasterdecisions in real time.Big Data Analytics Isn’t …Just about technology. At the business level, it’sabout how to exploit the vastly enhanced sources ofdata to gain insight.Only about volume. It’s also about variety andvelocity. But perhaps most important, it’s aboutvalue derived from the data.Generated or used only by huge online companieslike Google or Amazon anymore. While Internetcompanies may have pioneered the use of big dataat web scale, applications touch every industry.About “one-size-fits-all” traditional relationaldatabases built on shared disk and memoryarchitecture. Big data analytics uses a grid ofcomputing resources for massively parallelprocessing (MPP).Meant to replace relational databases or the datawarehouse. Structured data continues to becritically important to companies. However,traditional systems may not be suitable for the newsources and contexts of big data.10
“WHATISBIGDATA?”“Big Data Dreams”11“Every two days now we create as much information aswe did from the dawn of civilization up until 2003.That’s something like five exabytes of data”-Erik Schmidt, CEO of GoogleBy 2015 the digital universe is expected to reach 8Zettabytes -Intel
“WHATISBIGDATA?”“Big Data Dreams”121 Zettabyte = 18 million copies of the Library of Congress
“WHATISBIGDATA?”(ANDWHOWORKSIT…)“Big Data Dreams”A new kind of professional is helping organizations make sense of themassive streams of digital information: the data scientist. Data scientists areresponsible for modeling complex business problems, discovering businessinsights, and identifying opportunities.They bring to the job:•Skills for integrating and preparing large, varied data sets•Advanced analytics and modeling skills to reveal and understand hiddenrelationships•Business knowledge to apply context•Communication skills to present results13
“BYTHENUMBERS”“Big Data Dreams”•More sources and more devices•Mobile•Pictures•Video•SMS•GPS•Social Media•Facebook•Twitter•Youtube•Reviews•Automated Sources•RFID•Telemetry•Security cameras30Real-timecorrelation of datacan be turned intogolden nuggets ofinformation.
“8LAWSOFBIGDATA”“Big Data Dreams”31Big Data Law #1The Faster You Analyze Your Data, theGreater its Predictive Power.Great list developed by Dave Feinleib – Managing Director of Big Data Group.
“8LAWSOFBIGDATA”“Big Data Dreams”32Big Data Law #2Maintain one copy of your data, notdozens.
“8LAWSOFBIGDATA”33“Big Data Dreams”Big Data Law #3Use more diverse data, not just more data.
“8LAWSOFBIGDATA”“Big Data Dreams”34Big Data Law #4Data has value far beyond what youoriginally anticipate.
“8LAWSOFBIGDATA”“Big Data Dreams”35Big Data Law #5Plan for Exponential Growth
“8LAWSOFBIGDATA”“Big Data Dreams”36Big Data Law #6Solve a real pain point.
“8LAWSOFBIGDATA”“Big Data Dreams”37Big Data Law #7Put data and humans together to get moreinsight.
“8LAWSOFBIGDATA”“Big Data Dreams”38Big Data Law #8Big Data is transforming business the sameway IT did.
Q&A“Big Data Dreams”39Q&AMichael C. DeAloiaRegional Vice PresidentExpedient Data Centers(M) 216.212.4067(E) firstname.lastname@example.org