PPTX, PDF88 views

Inroduction to Big Data

This document provides an introduction to big data and related technologies. It defines big data as datasets that are too large to be processed by traditional methods. The motivation for big data is the massive growth in data volume and variety. Technologies like Hadoop and Spark were developed to process this data across clusters of commodity servers. Hadoop uses HDFS for storage and MapReduce for processing. Spark improves on MapReduce with its use of resilient distributed datasets (RDDs) and lazy evaluation. The document outlines several big data use cases and projects involving areas like radio astronomy, particle physics, and engine sensor data. It also discusses when Hadoop and Spark are suitable technologies.

Data & Analytics◦

Editor's Notes

#8 With receiving stations extending out to distance of at least 3,000 kilometres (1,900 mi) from a concentrated central core, it will exploit radio astronomy's ability to provide the highest resolution images in all astronomy. The SKA will be built in the southern hemisphere, in sub-Saharan states with cores in South Africa and Australia, where the view of the Milky Way Galaxy is best and radio interference least. Construction of the SKA is scheduled to begin in 2018 for initial observations by 2020. The SKA will be built in two phases, with Phase 1 (2018-2023) representing about 10% of the capability of the whole telescope. Phase 1 of the SKA was cost-capped at 650 million euros in 2013, while Phase 2's cost has not yet been established. The headquarters of the project are located at the Jodrell Bank Observatory, in the UK. The data collected by the SKA in a single day would take nearly two million years to playback on an iPod The SKA will be so sensitive that it will be able to detect an airport radar on a planet tens of light years away The SKA central computer will have the processing power of about one hundred million PCs The dishes of the SKA will produce 10 times the global internet traffic The SKA will use enough optical fiber to wrap twice around the Earth The aperture arrays in the SKA could produce more than 100 times the global internet traffic References: https://en.wikipedia.org/wiki/Square_Kilometre_Array https://www.skatelescope.org/ (SKA homepage) http://www.ska.gov.au/About/Pages/default.aspx http://www.ska.gov.au/NewZealandSKA/Pages/default.aspx
#9 Both Qantas and Singapore Airlines use Rolls-Royce Trent 900 engines in their A380 aircraft. Qantas Flight 32 was a Qantas scheduled passenger flight that suffered an uncontained engine failure on 4 November 2010 and made an emergency landing at Singapore Changi Airport. The failure was the first of its kind for the Airbus A380, the world's largest passenger aircraft. It marked the first aviation occurrence involving an Airbus A380. On inspection it was found that a turbine disc in the aircraft's No. 2 Rolls-Royce Trent 900 engine (on the port side nearest the fuselage) had disintegrated. The aircraft had also suffered damage to the nacelle, wing, fuel system, landing gear, flight controls, the controls for engine No. 1 and an undetected fire in the left inner wing fuel tank that eventually self-extinguished.[1] The failure was determined to have been caused by the breaking of a stub oil pipe which had been manufactured improperly. GE manufactures jet engines, turbines and medical scanners. It is using operational data from sensors on its machinery and engines for pattern analysis. References: http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business "The airline industry spends $200bn on fuel per year so a 2% saving is $4bn. GE provides software that enables airline pilots to manage fuel efficiency.“ “Another product, Movement Planner, is a cruise control system for train drivers. The technology assesses the terrain and the location of the train to calculate the optimal speed to run the locomotive for fuel economy. ” http://www.infoworld.com/article/2616433/big-data/general-electric-lays-out-big-plans-for-big-data.html “As one of the world's largest companies, GE is a major manufacturer of systems in aviation, rail, mining, energy, healthcare, and more. In recognition of the importance of big data to GE, CEO Jeff Immelt launched a new initiative called the “industrial Internet,” which aims to help customers increase efficiency and to create new revenue opportunities for GE through analytics.” “The industrial Internet is GE's spin on “the Internet of things,” where Internet-connected sensors collect vast quantities of data for analysis. According to Immelt, sensors have already been embedded in 250,000 "intelligent machines" manufactured by GE, including jet engines, power turbines, medical devices, and so on. Harvesting and analyzing the data generated by those sensors holds enormous potential for optimization across a broad range of industrial operations.
#10 References: http://www.popularmechanics.com/technology/a20540/300-tb-cern-data-large-hadron-collider/ (April 2016) “The most complex machine in mankind's history just put a gargantuan data trove online for anyone to parse. You think you've got the analytical chops to glean insights about the nature of the cosmos or God or simply the tendencies of muons? Go ahead, man, dig through the 300 terabytes of data that CERN, the European Organization for Nuclear Research, just dropped onto the cloud.” … “But it ain't nothing compared to what the National Security Agency works with. Going by2013 figures the agency released, the NSA's various activities ’touch’ 300 TB of data every 15 minutes or so.” http://www.theverge.com/2016/4/25/11501078/cern-300-tb-lhc-data-open-access “If you ever wanted to take a look at raw data produced by the Large Hadron Collider, but are missing the necessary physics PhD, here's your chance: CERN has published more than 300 terabytes of LHC data online for free. The data covers roughly half the experiments run by the LHC's CMS detector during 2011, with a press release from CERN explaining that this includes about 2.5 inverse fem tobarns of data - around 250 trillion particle collisions. Best not to download this on a mobile connection then.”
#25 Icons made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon"> www.flaticon.com</a>
#29 28
#37 36
#38 37
#39 38
#42 41
#43 42

Inroduction to Big Data

More Related Content

What's hot

Similar to Inroduction to Big Data

Recently uploaded

Editor's Notes