A history of Big Data by Dr John Mashey. It discusses 1890 census data, IBM mainframes, DEC minicomputer clusters, Silicon Graphics (SGI) applications and a history of Open Source computing.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
TinyML: Machine Learning for MicrocontrollersRobert John
My presentation at TensorFlow User Groups Sub-Saharan Africa Summit discusses machine learning for embedded devices, the importance, and the challenges.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
TinyML: Machine Learning for MicrocontrollersRobert John
My presentation at TensorFlow User Groups Sub-Saharan Africa Summit discusses machine learning for embedded devices, the importance, and the challenges.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
In this deck from the HPC User Forum in Milwaukee, Tim Barr from Cray presents: Perspective on HPC-enabled AI.
"Cray’s unique history in supercomputing and analytics has given us front-line experience in pushing the limits of CPU and GPU integration, network scale, tuning for analytics, and optimizing for both model and data parallelization. Particularly important to machine learning is our holistic approach to parallelism and performance, which includes extremely scalable compute, storage and analytics."
Watch the video: https://wp.me/p3RLHQ-hpw
Learn more: http://cray.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
Slides from lecture style tutorial on data quality for ML delivered at SIGKDD 2021.
The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Data remains susceptible to errors or irregularities that may be introduced during collection, aggregation or annotation stage. This necessitates profiling and assessment of data to understand its suitability for machine learning tasks and failure to do so can result in inaccurate analytics and unreliable decisions. While researchers and practitioners have focused on improving the quality of models (such as neural architecture search and automated feature selection), there are limited efforts towards improving the data quality.
Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. This tutorial highlights the importance of analysing data quality in terms of its value for machine learning applications. Finding the data quality issues in data helps different personas like data stewards, data scientists, subject matter experts, or machine learning scientists to get relevant data insights and take remedial actions to rectify any issue. This tutorial surveys all the important data quality related approaches for structured, unstructured and spatio-temporal domains discussed in literature, focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems.
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
In this deck from the HPC User Forum in Milwaukee, Tim Barr from Cray presents: Perspective on HPC-enabled AI.
"Cray’s unique history in supercomputing and analytics has given us front-line experience in pushing the limits of CPU and GPU integration, network scale, tuning for analytics, and optimizing for both model and data parallelization. Particularly important to machine learning is our holistic approach to parallelism and performance, which includes extremely scalable compute, storage and analytics."
Watch the video: https://wp.me/p3RLHQ-hpw
Learn more: http://cray.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
A 3-day interactive workshop for startups involve in Big Data & Analytics in Asia. Introduction to Big Data & Analytics concepts, and case studies in R Programming, Excel, Web APIs, and many more.
DOI: 10.13140/RG.2.2.10638.36162
Slides from lecture style tutorial on data quality for ML delivered at SIGKDD 2021.
The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Data remains susceptible to errors or irregularities that may be introduced during collection, aggregation or annotation stage. This necessitates profiling and assessment of data to understand its suitability for machine learning tasks and failure to do so can result in inaccurate analytics and unreliable decisions. While researchers and practitioners have focused on improving the quality of models (such as neural architecture search and automated feature selection), there are limited efforts towards improving the data quality.
Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. This tutorial highlights the importance of analysing data quality in terms of its value for machine learning applications. Finding the data quality issues in data helps different personas like data stewards, data scientists, subject matter experts, or machine learning scientists to get relevant data insights and take remedial actions to rectify any issue. This tutorial surveys all the important data quality related approaches for structured, unstructured and spatio-temporal domains discussed in literature, focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems.
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
Here's a retrospective and prospective report on Indonesian Youth trends
*kata kunci: pemasaran, tren anak muda,riset anak muda, riset pemasaran, strategi pemasaran anak muda, jasa riset, jasa riset pemasaran, biro riset pemasaran, konsultan pemasaran, youth culture asia,youth culture report,youth data,youth indonesia,youth insights,youth marketing indonesia,youth marketing reports,youth research,youthlab, indonesian youth, data anak muda, penelitian tentang anak muda, hasil survei tentang anak muda, jasa penelitian, anak muda indonesia, perusahaan penelitian pemasaran
A PowerPoint asking participants to choose from four variations of a particular verb and to insert them in the right space to match the picture / animation.
My amazing journey from mainframes to smartphones chm lecture aug 2014 finalDileep Bhandarkar
Disruptive technologies have caused dramatic changes in computing technology for decades, often in unacknowledged ways. In this talk, Dr. Dileep Bhandarkar will paint a picture that puts these changes into perspective, and which shows how this series of disruptions have set a course that has evolved from the mainframe to the current smartphone, mobile and cloud computing world.
Moore's law is the observation that the number of transistors in a dense integrated circuit doubles approximately every two years. The observation is named after Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, whose 1965 paper described a doubling every year in the number of components per integrated circuit, and projected this rate of growth would continue for at least another decade. In 1975, looking forward to the next decade, he revised the forecast to doubling every two years. The period is often quoted as 18 months because of Intel executive David House, who predicted that chip performance would double every 18 months (being a combination of the effect of more transistors and the transistors being faster).
Data Rescue and Preserving DR CapabilitiesChris Muller
Presentation at Best Practices Exchange conference in Harrisburg, PA. Enjoyable data rescue projects over the years and suggestions re IPDRC - Initiative to Preserve Data Rescue Capabilities.
My ISCA 2013 - 40th International Symposium on Computer Architecture KeynoteDileep Bhandarkar
Keynote speech delivered in Tel Aviv on 25 June 2013.
I had the privilege of being the first speaker at the First Annual Symposium on Computer Architecture in 1973. Over the last 40 years I have worked on PDP-11, VAX, MIPS, Alpha, x86, Itanium, and ARM processors and systems.
Moore’s Law has enabled computer architects to increase the pace of innovation and the development of microprocessors with new instruction sets.
In the 1970s, minicomputers from Digital Equipment Corporation, Data General and Hewlett Packard started to challenge IBM mainframes. The introduction of the 32-bit VAX-11/780 in 1978 was a landmark event. The single chip MicroVAX was introduced in 1985.
The IBM PC was introduced on August 12, 1981, followed by many IBM PC compatible machines from Compaq and others. This led to the tremendous growth of x86 processors from Intel and AMD. Today, the x86 processor dominate the computer industry.
In 1987, the introduction of RISC processors based on Sun’s SPARC architecture spawned the now famous RISC vs CISC debates. RISC processors from MIPS, IBM (Power, Power PC), and HP (PA-RISC) started to gain market share. This forced Digital to first adopt MIPS processors, and later introduce Alpha in 1992.
The RISC supremacy continued until the introduction of the first out of order x86 Pentium Pro processor in 1995, expanding the role of x86 into workstations and servers. The x86 architecture was extended to 64 bits by AMD in the Opteron processor in 2003, forcing Intel to launch its own compatible processor.
Disruptive technologies usually come from below. We have seen users migrate from mainframes to minicomputers to RISC workstations and servers to desktop PCs and PC servers to notebooks and tablets. Volume economics has driven the industry. The next wave will be the technology used in smart phones. With over a billion chips sold annually, this technology will appear in other platforms. Several companies have announced plans for ARM based servers.
Moore’s Law has also enabled computer architects to advance the sophistication of microprocessors. We will review some of the significant milestones leading from the first Intel 4004 to today’s state of the art processors.
Lectures on Silicon Valley at Beijing and other cities in China - September 2014 - excerpted from my book http://www.amazon.com/History-Silicon-Valley-Almost-3rd/dp/1500262226/ref=sr_1_3_bnp_1_pap?ie=UTF8&qid=1405191978&sr=8-3&keywords=scaruffi+silicon+valley
Moore's law is the observation that the number of transistors in a dense integrated circuit doubles approximately every two years. The observation is named after Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, whose 1965 paper described a doubling every year in the number of components per integrated circuit, and projected this rate of growth would continue for at least another decade. In 1975, looking forward to the next decade, he revised the forecast to doubling every two years. The period is often quoted as 18 months because of Intel executive David House, who predicted that chip performance would double every 18 months (being a combination of the effect of more transistors and the transistors being faster).
Similar to Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser (20)
I looked at the Trail Signs on Alpine Trail in Portola Valley as they didn't seem to be very effective at preventing cyclists on sections of the trail where bikes are prohibited.
This talk was given to the Quantified Self Meetup in San Jose on May 23rd, 2013. It shows how I became interested in biking, measuring bike rides and how I use Strava and Cyclemeter.
From Telephones to Tablets: The Good, The Bad and The UglyAngela Hey
A talk to a couple of Silicon Valley computer user groups on selecting a smartphone, tablet, or eReader. Also includes some age-related and ethnic-related secondary market data concerning use of mobile devices.
The Google Game:How Maps, Media and Mobility Are Changing Mass Marketing For...Angela Hey
The Google Game:How Maps, Media and Mobility Are Changing Mass Marketing For Ever. A public lecture given in the Asian Civilizations Museum, Singapore in November 2007 organized by the Republic Polytechnic of Singapore.
How Google Grew: 10 Good Moves in 10 Good YearsAngela Hey
How Google Grew: 10 Good Moves in 10 Good Years.
A presentation given to the Smeal College of Business Administration at Penn State University, Fall 2008.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
When stars align: studies in data quality, knowledge graphs, and machine lear...
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
1. Big Data – Yesterday, Today and Tomorrow 0
John R. Mashey
Monday 09/16/13 For
Big Data
Yesterday, Today, and Tomorrow (alpha)
2. Speaker – John R. Mashey
Big Data – Yesterday, Today and Tomorrow 1
• Pennsylvania State University, 1964-1973, BS Math, MS/PhD Computer Science
• Bell Labs 1973-1983, MTS Supervisor, early UNIX
– Programmer’s Workbench, shell programming, text processing,
workload measurement/tuning in first UNIX computer center,
UNIX+mainframe data mining apps, capacity planning/tuning
• Convergent Technologies 1983-1984, MTS Director Software
– Compiler & OS tuning, uniprocessor/multiprocessor servers
• MIPS Computer Systems 1985-1992, Manager OS VP Systems Technology
– System coprocessor, TLB, interrupt-handling; byte addressing(!), halfword instructions;
ISA evolution, multiprocessor features, multi-page-size TLB, 64-bit
– MIPS Performance Brief editor; a SPEC benchmarking group founder 1988
– Hot Chips Conference (Stanford) committee ... continuing
• Silicon Graphics 1992-2000, Director Systems Technology VP & Chief Scientist
– ccNUMA system architecture (NUMAflex in Origin3000, Altix, still) … small to REALLY BIG
– Performance issues in HPC, DBMS; technology forecasting
– Evangelist, much work with sales and marketing, business development, strategy
– By 1994, started using “Big Data” in modern sense, made it an SGI marketing theme
• Advise high-tech companies, VCs
Technical advisory boards, as for wireless sensor net companies, software co’s, Carbon Zero Inst.
Computer History Museum (www.computerhistory.org) Trustee; VCTaskForce.
Travel; ski; hike; bike; write articles & speak, blog on climate issues
New
Jersey
Silicon
Valley
PA
3. Why I’m Here … Fingered by the NY Times
Big Data – Yesterday, Today and Tomorrow 2
http://en.wikipedia.org/wiki/Talk:Big_data
http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story
‘Since I first looked at how he used the term, I
liked Mr. Mashey as the originator of Big Data. In
the 1990s, Silicon Graphics was the giant of
computer graphics, used for special-effects in
Hollywood and for video surveillance by spy
agencies. It was a hot company in the Valley that
dealt with new kinds of data, and lots of it….
When I called Mr. Mashey recently, he said that
Big Data is such a simple term, it’s not much a
claim to fame. His role, if any, he said, was to
popularize the term within a portion of the high-
tech community in the 1990s. “I was using one
label for a range of issues, and I wanted the
simplest, shortest phrase to convey that the
boundaries of computing keep advancing,” …
At the University of Pennsylvania, Mr. Diebold
kept looking into the subject as well. … His
most recent paper concludes: “The term Big
Data, which spans computer science and
statistics/econometrics, probably originated in
the lunch-table conversations at Silicon Graphics
in the mid-1990s, in which John Mashey figured
prominently.”
4. Big Data – Yesterday, Today and Tomorrow 3
Overview
(UC) San Francisco
Livermore
• Big Data – has my definition changed?
• Yesterdays – many, with some lessons
– “Those who cannot remember the past are condemned to repeat it”
George Santayana
• Today – how did we get here
• Tomorrow – 30,000-foot view, issues
• Q&A (but ask during, while slides up, may ask to hold)
5. Big Data – Yesterday, Today and Tomorrow 4
Big Data – “How has your definition changed from 1994?”
(UC) San Francisco
Livermore
• Not at all!??
• My definition always had 2 ideas:
• Big Data was:
– Beyond widely-available computers (Current)
Memory capacity (Volume)
I/O Performance / Real-time issues (Velocity)
Complex/difficult data – multimedia, 3D models, scientific data sets (Variety)
• Big Data was:
– A moving target
– Trivial – fits in memory of widely-available system
– Easy – plus a few local disks (or once upon a time, tapes)
– Big – needs large system or tight-coupled network of them
nontrivial programming, sometimes much roll-your-own
– Impossible within the state of the art, even with huge $$$
– Large 1970 mainframes could get 1MB memory (~1us), 8MB slow
~Cache memory on laptop now
• But, people have done Big Data for a long time, but the buzzwords change
6. Big Data – Yesterday, Today and Tomorrow 5
Big Data - 1890
(UC) San Francisco
Livermore
• Early Big Data
Image courtesy of Computer History Museum. www.computerhistory.org/collections/catalog/102618690
7. Big Data – Yesterday, Today and Tomorrow 6
Big Data ~1950
(UC) San Francisco
Livermore
• IBM 026 Keypunch, 082 Sorter and 403 Accounting Machine (c.1950)*
Image courtesy of Computer History Museum. www.computerhistory.org/collections/catalog/102645475
Image courtesy of Computer History Museum. www.computerhistory.org/collections/catalog/102670856
Plugboard … “software” for the accounting machine
“Weekly Statement Panel. Take Manual Final Before
Entering Data. Hammers Plits A=3 IF N-2-7-18-20. Long
Hammer Locks N-8-9. To Set Up Date, SW 1 ON. Feed Card
with date punched 34-39 & turn SW 1 OFF."
* But this sort of gear was still around in 1967.
I did actually use a sorter a few times when
box of program cards got dropped.
8. Big Data – Yesterday, Today and Tomorrow 7
Big Data ~1960s-
(UC) San Francisco
Livermore
• IBM 360 Model 40 (c.1965) 24-bit address: 16MB memory maximum.
A “Big” mainframe was a 360/65 , 67* or 75:
Up to 1MB of main (magnetic) core storage.
Optionally, up to 8MB of slower “Large Core Storage”
360/40 -medium system with 4 tape drives in back,
and 2 2311 disk drives, each max capacity = 7.2MB.
By 1966, the first 2314 drives shipped, each 28MB.
A system with 10 28MB drives was big.
A rare IBM 2321 Data Cell ** nicknamed
“noodle snatcher” or “washing machine”
offered 400MB of random-access storage!!
Ours worked OK, but many had mechanical trouble.
1968: 360/67: first IBM 360 with virtual memory …
CP-67/CMS VM/CMS virtual machines … 40+ years
SERVICE BUREAUS – shared access to mainframes
otherwise unaffordable
CENTRALIZED
Stack wars: big vendors
Image courtesy of Computer History Museum.
www.computerhistory.org/collections/catalog/102618836
By 1960s, Big Data was 2400ft, 8-Track tape
40MB max – 1600 BPI IBM 24xx
In 1970s, IBM upped this:
170MB max – 6250 BPI IBM 34xx, 1.25MB/sec
9. Big Data – Yesterday, Today and Tomorrow 8
Big Data ~1970s
(UC) San Francisco
Livermore
• IBM 370, virtual-memory upgrades from 360s
– 1972- 370/168: typical high end: up to 8MB of fast memory
Not Big Data yet, but signs on the horizon…
• Minicomputers really got rolling - Digital Equipment Corporation
– 1973 – PDP-11/45, 248KB memory, 16-bit – 16 users
– 1975 – PDP-11/70, up to 4MB – 48 users
– 1975 – DEC RP04 disk drive ~92MB
– 1977 - VAX 11/780, up to 8MB
– 1977 – DEC RP06 disk drive ~178MB
– Clustering to compete with real mainframes
• Microprocessors? Late in decade, “toys”
– But personal computers, workstations got going
• Stack wars – algorithmic languages
Image courtesy of Computer History Museum.
www.computerhistory.org/collections/catalog/102685442
10. Big Data – Yesterday, Today and Tomorrow 9
Big Data ~1970s – 1980s
(UC) San Francisco
Livermore
• Bell Labs (1973-1983), 1M+ person Bell System, 25,000 people in R&D
– First half: Programmer’s Workbench/UNIX
» “Small is Beautiful and Other Thoughts on Programming Strategies”
» Use tools, existing software, scripting languages
» Small teams, fast iterations / prototyping, move quickly, get user feedback (sound familiar?)
– Big Data
» Track every telephone pole, cable, junction box, geography, trouble reports, squirrels, guns
» ~400 operations support systems
– Charge people for $.10 phone calls Murray Hill, NJ Building 5 analyzed call records
– Loop Maintenance Operations System (LMOS)
» IBM Mainframe with larger database, with triplexed minicomputers to support calls to 411
» CRAS – backend data mining system, ACE expert system offshoot (BTL 1st )
» LMOS transaction database work Tuxedo Novell BEA Oracle
– The Cloud in many internal talks, telephone system
» ACS (BDN, Net/1, Net/1000) processing in the network,
dumb terminals at edge, $1B
» Attempt to build a packet-switched “Internet Cloud” in 1970s … too early
11. • Teradata –parallel data warehouses, special hardware (started in 1970s)
» 1983 – first Beta system, 1991 NCR AT&T, Teradata NCR, later spun out
» 1992 – first system > 1TB, 1999 – 130TB … en.wikipedia.org/wiki/Teradata
» Started using “Big Data” term in 2010 … but classic Big Data company from start
• 1987 RAID – UC Berkeley, Patterson, Gibson, Katz
Not Big Data yet, but signs on the horizon…
• Network stack wars: Networks, DARPA funded UCB
– 1983 – 4.2BSD, with TCP/IP
• RDBMS wars – Oracle, Informix, Sybase, etc, etc
• UNIX stack version wars, by vendor and camp
• Distribution wars: workstations, client-server, PCs, networks, thin-clients
• Microprocessor wars of late 1980s, early 1990s
– Multiprocessors for bigger systems, rapid end for most minicomputer companies
– 1988 First 64-bit microprocessor, MIPS R4000 1991
• Stack wars – window systems
Big Data – Yesterday, Today and Tomorrow 10
Big Data ~1980s
Livermore
12. • Internet growing, WWW, Multimedia
Browser wars
Not Big Data, but Linux on the horizon
• 1992 – 64-bit micro-based systems
– SGI Crimson, early 1992, still running 32-bit OS, no new 32-bit-only designs
– DEC Alpha systems, late in year, 64-bit-only (plausible)
• 1994 - SGI
– Large systems running 64-bit IRIX, with 64/32-bit user programs
– 1993’s Challenge XL got OS upgrade, up to 36p or 16GB memory
– XFS – Full 64-bit UNIX file systems, journaled, for serious Big Data
» Later contributed to Linux, along with scaleability improvements
- Customers could finally just recompile and use >4GB of memory in one program
- “The Long Road to 64 Bits – Double, Double, Toil and Trouble”
http://queue.acm.org/detail.cfm?id=1165766
http://cacm.acm.org/magazines/2009/1/15667-the-long-road-to-64-bits/fulltext
Big Data – Yesterday, Today and Tomorrow 11
Big Data ~1990s (SGI)
Livermore
13. Big Data – Yesterday, Today and Tomorrow 12
“Hardware, Wetware, Software” (1994, 1995, 1996)
(UC) San Francisco
Livermore
14. Big Data – Yesterday, Today and Tomorrow 13
“HW, WW, SS” (1994, 1995, 1996)
(UC) San Francisco
Livermore
15. Big Data – Yesterday, Today and Tomorrow 14
“HW, WW, SS” Traditional SGI
(UC) San Francisco
Livermore
SGI – SC’96
Supercomputing show
November 1996
16. Big Data – Yesterday, Today and Tomorrow 15
“HW, WW, SS” – New Markets
(UC) San Francisco
Livermore
17. Big Data – Yesterday, Today and Tomorrow 16
“Hardware, Wetware, Software”
(UC) San Francisco
Livermore
Video2
Video1
18. Big Data – Yesterday, Today and Tomorrow 17
Money Can Buy Bandwidth, but Latency is Forever
(UC) San Francisco
Livermore
19. Big Data – Yesterday, Today and Tomorrow 18
Money Can Buy Bandwidth, but Latency is Forever
(UC) San Francisco
Livermore
20. Big Data – Yesterday, Today and Tomorrow 19
Lower Response Times Changes Applications
(UC) San Francisco
Livermore
21. Big Data – Yesterday, Today and Tomorrow 20
Big Data @ SGI ~1995
(UC) San Francisco
Livermore
22. Big Data – Yesterday, Today and Tomorrow 21
(UC) San Francisco
Livermore
Big Data @ SGI ~1995
23. Big Data – Yesterday, Today and Tomorrow 22
(UC) San Francisco
Livermore
Big Data @ SGI ~1997
24. Big Data – Yesterday, Today and Tomorrow 23
(UC) San Francisco
Livermore
Big Data @ SGI ~1997
25. Big Data – Yesterday, Today and Tomorrow 24
(UC) San Francisco
Livermore
Big Data @ SGI ~1997
26. Big Data – Yesterday, Today and Tomorrow 25
(UC) San Francisco
Livermore
Big Data @ SGI ~1998
https://www.usenix.org/conference/1999-usenix-annual-technical-conference/big-data-and-next-wave-infrastress-problems
27. Big Data – Yesterday, Today and Tomorrow 26
(UC) San Francisco
Livermore
Big Data @ SGI ~1998
28. Big Data – Yesterday, Today and Tomorrow 27
(UC) San Francisco
Livermore
Big Data @ SGI ~1998
29. Big Data – Yesterday, Today and Tomorrow 28
(UC) San Francisco
Livermore
Big Data @ SGI ~1998
30. Big Data – Yesterday, Today and Tomorrow 29
(UC) San Francisco
Livermore
Big Data @ SGI ~1998-
31. Big Data – Yesterday, Today and Tomorrow 30
(UC) San Francisco
Livermore
Big Data @ SGI ~1998-
32. Big Data – Yesterday, Today and Tomorrow 31
Big Data Present – How Did We Get Here?
(UC) San Francisco
Livermore
• 1994 Beowulf technical PC clusters (not Big Data, but sign on horizon)
• 2003 AMD 64-bit X86, 2004 Intel
• Virtual machines … ~ IBM VM from 1970s
• Web companies like Google, etc
• Map-Reduce, Hadoop – software infrastucture did for relevant apps
– What parallel programming tools did in 1990s for technical codes
– What WWW did for making Internet more accessible
• Amazon Web Services, etc … echo of 1970s service bureaus
• Open-source sharing, GitHub, etc.
33. Big Data – Yesterday, Today and Tomorrow 32
Big Data Future – 30,000-foot view
(UC) San Francisco
Livermore
• Hardware discontinuities always cause turmoil, then industry settles
• Fairly predictable to happen, straightforward
• Moore’s Law for CMOS still gets a few more turns, getting very hard
• Clock-rate stalled multi-core, parallel programming more important
• Power/heat increasingly important
• Disks keep getting denser, but bandwidth less so, seek/rotation times: no
• Network speeds keep improving, but widespread 100Ge not much yet
• Fairly predictable to happen, effects may be surprising
• Internet-of-things, wireless sensor networks
• Personal graphics, see Vernor Vinge, “Rainbow’s End”
• Flash as more than SSD?
• Big changes whenever change to memory hierarchy
• Many software issues … 4KB pages (as per S/360) are bad news
34. Penn State – MIS446 33
Retrospective – “Open Source”
• “Open Source” is most recent term for “ancient” practice
1948 – David Wheeler invents subroutines for EDSAC@ Cambridge
1952 – John von Neumann donates designs for Princeton IAS
1955 – IBM SHARE User’s Group founded; user groups trade code
1961 – DECUS (Digital Equipment Corporation) user group founded
1960s – IBM HASP (mainframe OS code, user-modified)
“Should old Chuck Forney be forgot, and HASP songs sung no more.” *
1960s – IBM S/360 vastly increases set of compatible systems, code-trading
Penn State ASSIST (Mashey & others), 1970- … still running 38 years later!
1970s – UNIX “open source” within Bell Labs
1970s – UNIX licensed to universities, government, “as is, don’t call us”
1970s – John Lions “Commentary on UNIX, with Source Code”
1970s – Berkeley UNIX, Ken Thompson, DARPA $, Internet
1976 – B. W. Kernighan, P. J. Plauger, “Software Tools”, (UNIX tools) RATFOR.
Software Tools User’s Group (STUG) to get UNIXy code on other systems
1979 – UNIX V7 released – (reasonably) portable OS
1985 – Free Software Foundation (UNIX commands, especially GNU C)
1991 – Linux (kernel); later Apache, etc, etc.
• Local libraries magnetic tapes UUCP Internet Web, GitHub, etc
• Local groups vendor-based groups large expansion for public
* Sung to “Auld Ang Sang”; Forney was Asst Director of PSU Computer Center
35. Big Data – Yesterday, Today and Tomorrow 34
Big Data Future – Stack wars
(UC) San Francisco
Livermore
• Another period of turmoil in software in this area
• Tall, wide software stacks via multiple vendors
• Long ago, the whole stack was little more than
Simple OS, compiler/assembler/linker, a few libraries
• Creating software on top of stack
• Assess choices carefully
• As in 1980s, Russian Roulette with a few years’ delay
• Software inside a stack
• Best technology does not always win
• Alliances, partners really matter
• Customers help!
• Anything that gets enough of market lasts a long time … last slide
36. Penn State – MIS446 35
Retrospective … Future
• John R. Mashey, “Languages, Levels, Libraries, and Longevity”
– ACM Queue, Vol. 2, No. 9 - Dec/Jan 2004-2005
– http://www.acmqueue.org/modules.php?name=Content&pa=printer_friendly&pid=245&page=1
‘In 50 years, we’ve already seen numerous programming systems come and (mostly) go,
although some have remained a long time and will probably do so for: decades? centuries?
millennia? …
For the far future, Vernor Vinge’s fine science-fiction novel, A Deepness in the Sky, rings
all too true. The young protagonist, Pham, has joined a starship crew and is learning the
high-value vocation of “programmer archaeologist,” as the crew’s safety depends on the
ability to find needed code, use it, and modify it without breaking something. He is initially
appalled at the code he finds:
“The programs were crap…Programming went back to the beginning of time…There were
programs here that had been written five thousand years ago, before Humankind ever left
Earth. The wonder of it—the horror of it…these programs still worked…down at the very
bottom of it was a little program that ran a counter. Second by second, the Qeng Ho
counted from the instant that a human had first set foot on Old Earth’s moon. But if you
looked at it still more closely… the starting instant was actually about fifteen million seconds
later, the 0-second of one of Humankind’s first computer operating systems…”
“We should rewrite it all,” said Pham.
“It’s been done,” said Sura.
“It’s been tried,” corrected Bret…“You and a thousand friends would have to work for a
century or so to reproduce it… And guess what—even if you did, by the time you finished,
you’d have your own set of inconsistencies. And you still wouldn’t be consistent with all the
applications that might be needed now and then…”
“The word for all this is ‘mature programming environment.’’’’
37. Big Data – Yesterday, Today and Tomorrow 36
Extra
(UC) San Francisco
Livermore
38. 37
For the 2002 BSDcon, I grabbed talks from 30 years ago, and used (images of) the
original foils for authenticity, to help see what's changed and what's the same.*
The first part, "Small is Beautiful and Other Thoughts on Programming Strategies,"
was first used in 1977, and was later given many times as Association for
Computing Machinery (ACM) National Lectures.
I was working on the Programmer's Workbench flavor of UNIX, and we'd had great
success in making UNIX available to much wider audiences of software engineers
targeting both UNIX-related and non-UNIX environments.
We were strong believers in UNIX philosophies of tool-building and -using, and
keeping software teams small during an era when there was strong emphasis on
methodologies and large teams that were anything but lightweight. This talk was the
result, and was considered somewhat radical at the time.
Scripting languages, development environments, “agile development”
* I still have the original foils, but they’re starting to wear out, and actually, old
overhead projectors have started to disappear in favor of computers….
Originals were UNIX troff + hand-drawn graphics … not PowerPoint!
From: http://www.usenix.org/events/bsdcon/mashey_small, Thanks USENIX!
Small is Beautiful
And Other Thoughts on Programming Strategies (1977-)
39. Small Is Beautiful 38
(UC) San Francisco
Livermore
Evolution and Entropy
40. Small Is Beautiful 39
(UC) San Francisco
Livermore
OK
OK
How Things Get Complex
41. Small Is Beautiful 40
(UC) San Francisco
Livermore
Evolution and Entropy
42. Small Is Beautiful 41
(UC) San Francisco
Livermore
OK
OK
How Things Get Complex
45. Penn State – MIS446 44
Miscellaneous
• Code is malleable, invisible, more art than science
– Yes, but much more visible than it used to be (open source + Web)
– There has long been some (but not much) science in software engineering,
there has always been art, [or good taste]
there is more good engineering than there used to be
• Code is political, often instantiates rules invisibly
– Yes, see Ravenflow, we still really need requirements code (automagically)
– Business-English use cases converted to Visio charts by hand, ugh.
– But, much code is now at high-level and visible
– What’s an Excel spreadsheet? In the “old days”, it would have been FORTRAN
• Access privileges, work-arounds, trapdoors, audits
– Humans have always been the weakest links
– Overly-simple system is not sufficiently flexible
– Overly-complex system is too much trouble, generates loopholes and bad behavior
• Code may be ephemeral … but actually has amazing longevity
– Lasts far longer than hardware!
46. Penn State – MIS446 45
(UC) San Francisco
Livermore
Use Existing Tools
47. Big Data – Yesterday, Today and Tomorrow 46
Silicon Valley Waves
http://web.archive.org/web/20090101000000*/http://www.next10.org/pdf/GII/Next10_FullFindings_EN.pdf
(UC) San Francisco
Livermore
(UC) Berkeley