Hector Guerrero- Road to Business Analytics

Back to the Classroom:
The Road to Business Analytics
Professor Hector Guerrero

Disk Storage
1 Bit = Binary Digit
8 Bits = 1 Byte
1000 Bytes = 1 Kilobyte
1000 Kilobytes = 1 Megabyte
1000 Megabytes = 1 Gigabyte
1000 Gigabytes = 1 Terabyte
1000 Terabytes = 1 Petabyte
1000 Petabytes = 1 Exabyte
1000 Exabytes = 1 Zettabyte
1000 Zettabytes = 1 Yottabyte
1000 Yottabytes = 1 Brontobyte
1000 Brontobytes = 1 Geopbyte
Data Science / Big Data / Business or Data Analytics

The tasty recipe for Business Analytics--
Business Analytics, 2nd Edition James R. Evans, ©2016 | Pearson

Origins of the 3 Ingredients
• Probability/Statistics …..1700’s
• Early efforts to understand uncertainty (modern Stats 1900)
• Operations Research …..1940’s
• Attempt to bring greater efficiency to use of scare resources
• Computer Technology Science …..1940’s
• 1st Ph.D. in Computer Science awarded at Purdue in 1966

What is Data Science and a Data Scientist?
• “Data science, also known as data-driven science, is an interdisciplinary
field about scientific methods, processes, and systems to extract knowledge or
insights from data in various forms, either structured or unstructured, similar to
Knowledge Discovery in Databases (KDD).”
• “Data science is a "concept to unify statistics, data analysis and their related
methods" in order to "understand and analyze actual phenomena" with data. It
employs techniques and theories drawn from many fields within the broad
areas of mathematics, statistics, information science, and computer science,
in particular from the subdomains of machine learning, classification, cluster
analysis, data mining, databases, and visualization.”
Modified from… https://en.wikipedia.org/wiki/Data_science

According to Wikipedia--
“When Harvard Business Review called it "The Sexiest Job of the 21st
Century" the term became a buzzword, and is now often applied to
business analytics, or even arbitrary use of data, or used as a sexed-up
term for statistics. While many university programs now offer a data
science degree, there exists no consensus on a definition or curriculum
contents. Because of the current popularity of this term, there are many
"advocacy efforts" surrounding it.”
Modified from … https://en.wikipedia.org/wiki/Data_science

A Process Map of Data Science
https://en.wikipedia.org/wiki/Data_science

A simple timeline of lessons learned and observations
• 1966– off to Univ. Texas as an EE
• 1970– off to what would become Silicon Valley
• 1978/80– off to Univ. Texas-MBA/Univ. Washington-Ph.D.
• 1982– off to Tuck School at Dartmouth
• 1986– off to Notre Dame
• 1990– off to W&M
• 2017– off to retirement (?)

1966– off to Univ. Texas as an EE
• No computers at my high school, or likely many high schools in that time.
During orientation all Engineering majors required to learn Fortran
programming an write a complex program in 2.5 days. Went from 4500 to 500
majors!
• Lesson-- It was hard to become an engineer at UT, and one way to cull the herd
is to terrorize students and see who survives
• Take first Operations Research classes—I’m in heaven!
• First Ph.D. in Computer Science offered at Purdue Univ.

1970– off to what would become Silicon Valley
• Lockheed Missiles and Space company– 30k employees
• Play Pong by Atari at Andy Capp’s Tavern in Sunnyvale
• Realize that computers are going to be the most important tool in my professional life, and
that my training in math was equally important
• Attend Engineering Economics program at Stanford and introduced to Decision Analysis—
read about early AI concepts, Neural Networks, Rule-Based Systems, Bayesian analysis,
Logic (fuzzy), Expert Systems, etc.
• All seemed important, but a little distant due to lack of computer power– for the most part
is was conceptual. No way, or difficult, to actually use these concepts

1978/80–off to Univ. Texas MBA/ Univ. Washington Ph.D.
• MBA was King/Queen of all Degrees
• I learned there were firms that would pay for abilities in operations research, but very focused (for
example--my ability to do time series forecast models)
• Ex. Later… can you build a model for efficient distribution of natural gas/purchase futures contracts?
• Learned to do modeling of many types—simulation, optimization, etc.
• Still, the capabilities of these techniques were limited by the processing capabilities of computers!
• My dissertation was typed manually—next year a student colleague used an IBM personal computer.
Ms. Lupe Lopez lost job—she had typed dissertations for 40 years (sad).

1982– off to Tuck School at Dartmouth
Data General One
• one or two 3.5-inch floppy drives - the first
portable computer to incorporate the new
Sony 3.5-inch disks.
• a huge 11-inch display - the largest of any
portable computer - capable of displaying a
full 25 lines of text with 80 characters per
line.
• weighing only 10 pounds, it is significantly
lighter than competing CRT-based portable
system, like the IBM Portable
• up to eight hours of run time using the
internal rechargeable batteries.
• The MBA is still King/Queen as long as you are
Finance or Marketing Major– especially
Investment Banking. Jim Bradley was a student in
my classes and a real Geek!
• I was NOT a Dartmouth Man!
• I did begin to see a break to more high-tech jobs
and Entrepreneurship that required technology
• I was still using “baby problems”, “Little–Data” in
the classroom

1986– off to Notre Dame
• I began research on Rule-Based Robotics—simple AI
• Excel comes to forefront as “the working man’s/woman’s analytic platform”
• I had Bill Jelen, Mr. Excel on the internet, in class– He convinced me!!
• Apple produces a video predicting the use of computers and smart
assistants

1990– off to W&M
• Deep Blue (IBM) partially defeats Kasparov in Chess
• Watson was not far behind and more sophisticated use of AI
• Technology became omnipresent
• Could do real demos of analyses in classroom
• Students could follow and try themselves
• Statisticians debate whether they should call themselves Data Scientists
• Big Data and Analytics emerges as the way to compete
“Companies questing for killer apps generally focus all their firepower on the one area that promises to create the greatest
competitive advantage. But a new breed of company is upping the stakes. Organizations such as Amazon, Harrah’s,
Capital One, and the Boston Red Sox have dominated their fields by deploying industrial-strength analytics across a wide
variety of activities. In essence, they are transforming their organization.”
Competing on Analytics, Thomas H. Davenport, January 2006

2017– off to retirement (?)
• I develop and teach an online Business Analytics class in our
OMBA– I was skeptical, but it’s a big success
• I teach Intermediate Probability and Statistics to our inaugural
MSBA class– soon to also be an online program
• I teach an online Business Analytics class to our MAcc program
• I develop an online Business Analytics for UGs
• I wonder if it was the right time to retire– then I remember IT WAS!

Where are we in this brave new world?
• What’s working and Hot? …..AI!!
• The future of the “customer experience”
• Replacement of humans in work
• Autonomous agents, including vehicles
• What’s the future?......AI!!
• Questions about displacement
• Questions about ethics
• Questions about the effect on human existence

August May
What does a Business Analytics Degree look like?

A brief glossary of terms
http://data-informed.com/glossary-of-big-data-terms/ (modified through some omission)

Some important terms--
Algorithm
• A process or set of rules to be followed in calculations or other problem-solving
operations, especially by a computer.
Analytics
• The discovery, interpretation, and communication of meaningful patterns in data.
Artificial Intelligence
• The theory and development of computer systems able to perform tasks that
normally require human intelligence, such as visual perception, speech
recognition, decision-making, and translation between languages.

Contd.
Data management
According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in
an enterprise:
data governance
data architecture, analysis, and design
database management
data security management
data quality management
reference and master data management
data warehousing and business intelligence management
document, record, and content management
metadata management
contact data management
Data mining
The process of deriving patterns or knowledge from large data sets.
Data science
A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer
programming, data mining, machine learning, and database engineering to solve complex problems.
Data scientist
A practitioner of data science.

Data visualization
A visual abstraction of data designed for the purpose of deriving meaning or communicating
information more effectively.
Data warehouse
A place to store data for the purpose of reporting and analysis.
Database
A digital collection of data and the structure around which the data is organized. The data is typically
entered into and accessed via a database management system (DBMS).
Enterprise resource planning (ERP)
A software system that allows an organization to coordinate and manage all its resources, information,
and business functions.
Exploratory data analysis
An approach to data analysis focused on identifying general patterns in data, including outliers and
features of the data that are not anticipated by the experimenter’s current knowledge or
preconceptions. EDA aims to uncover underlying structure, test assumptions, detect mistakes, and
understand relationships between variables.
Contd.

Contd.
Internet of Things (IoT)
The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to
achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each
thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet
infrastructure.
Machine learning
A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed.
Machine learning focuses on the development of computer programs that can change when exposed to new data.
Metadata
Any data used to describe other data–for example, a data file’s size or date of creation.
Natural language processing
The ability of a computer program or system to understand human language. Applications of natural language processing
include enabling humans to interact with computers using speech, automated language translation, and deriving meaning
from unstructured data such as text or speech data.
NoSQL
A class of database management system that does not use the relational model. NoSQL is designed to handle large data
volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the
relational model.

A more complete glossary
http://data-informed.com/glossary-of-big-data-terms/ (modified through some omissions)

Analytics and Big Data Glossary
Last updated: 3/16/17
Algorithm
A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
Analytics
The discovery, interpretation, and communication of meaningful patterns in data.
Analytics platform Application
Software that is designed to perform a specific task or suite of tasks.
Artificial Intelligence
The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Behavioral analytics
Using data about people’s behavior to understand intent and predict future actions.
Big data
This term has been defined in many ways, but along similar lines. Doug Laney, then an analyst at the META Group, first defined big data in a 2001 report called “3-D Data Management: Controlling Data Volume, Velocity and Variety.” Volume refers to the sheer size of the datasets. The McKinsey report, “Big Data: The Next Frontier
for Innovation, Competition, and Productivity,” expands on the volume aspect by saying that, “’Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”
Velocity refers to the speed at which the data is acquired and used. Not only are companies and organizations collecting more and more data at a faster rate, they want to derive meaning from that data as soon as possible, often in real time.
Variety refers to the different types of data that are available to collect and analyze in addition to the structured data found in a typical database. Barry Devlin of 9sight Consulting identifies four categories of information that constitute big data:
1. Machine-generated data. This includes RFID data, geolocation data from mobile devices, and data from monitoring devices such as utility meters.
2. Computer log data, such as clickstreams from websites.
3. Textual social media information from sources such as Twitter and Facebook.
4. Multimedia social and other information from Flickr, YouTube, and other similar sites.
Business intelligence (BI)
The general term used for the identification, extraction, and analysis of data.
Classification analysis
Data analysis for the purpose of assigning the data to a particular group or class.
Cloud
A broad term that refers to any Internet-based application or service that is hosted remotely.
Clustering analysis
Data analysis for the purpose of identifying similarities and differences among data sets so that similar data sets can be clustered together.
Computer-generated data
Any data generated by a computer rather than a human–a log file for example.

Contd.
Correlation analysis
• A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables.
• Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation
between the demand for a product and its price.
Customer relationship management (CRM)
• Software that helps businesses manage sales and customer service processes.
Dashboard
• A graphical reporting of static or real-time data on a desktop or mobile device. The data represented is typically high-level to give managers a quick report on status or performance.
Data
• A quantitative or qualitative value. Common types of data include sales figures, marketing research results, readings from monitoring equipment, user actions on a website, market growth projections, demographic information, and customer lists.
Data analytics
• The application of software to derive information or meaning from data. The end result might be a report, an indication of status, or an action taken automatically based on the information received.
Data analyst
• A person responsible for the tasks of modeling, preparing, and cleaning data for the purpose of deriving actionable information from it.
Data architecture and design
• How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: conceptual representation of business entities. the logical representation of
the relationships among those entities, and the physical construction of the system to support the functionality.
Data center
• A physical facility that houses a large number of servers and data storage devices. Data centers might belong to a single organization or sell their services to many organizations.
Data cleansing
• The act of reviewing and revising data to remove duplicate entries, correct misspellings, add missing data, and provide more consistency.
Data collection
• Any process that captures any type of data.
• The process of combining data from different sources and presenting it in a single view.
Data integrity
• The measure of trust an organization has in the accuracy, completeness, timeliness, and validity of the data.

Contd.Data management
• According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in an enterprise:
• data governance
• data architecture, analysis, and design
• database management
• data security management
• data quality management
• reference and master data management
• data warehousing and business intelligence management
• document, record, and content management
• metadata management
• contact data management
Data marketplace
• A place where people can buy and sell data online.
Data mart
• The access layer of a data warehouse used to provide data to users.
Data migration
• The process of moving data between different storage types or formats, or between different computer systems.
Data mining
• The process of deriving patterns or knowledge from large data sets.
Data model, data modeling
• A data model defines the structure of the data for the purpose of communicating between functional and technical people to show data needed for business processes, or for communicating a plan to develop how data is stored and accessed among application
development team members.
Data science
• A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems.
Data scientist
• A practitioner of data science.

Data security
• The practice of protecting data from destruction or unauthorized access.
Data structure
• A specific way of storing and organizing data.
Data visualization
• A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.
Data warehouse
• A place to store data for the purpose of reporting and analysis.
Database
• A digital collection of data and the structure around which the data is organized. The data is typically entered into and accessed via a database management system (DBMS).
Database administrator (DBA)
• A person, often certified, who is responsible for supporting and maintaining the integrity of the structure and content of a database.
Database management system (DBMS)
• Software that collects and provides access to data in a structured format.
Demographic data
• Data relating to the characteristics of a human population.
Distributed processing
• The execution of a process across multiple computers connected by a computer network.
Document management
• The practice of tracking and storing electronic documents and scanned images of paper documents.
Electronic health records (EHR)
• A digitized health record meant to be usable across different health care settings.
Enterprise resource planning (ERP)
• A software system that allows an organization to coordinate and manage all its resources, information, and business functions.
Exploratory data analysis
• An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions. EDA aims to uncover underlying
structure, test assumptions, detect mistakes, and understand relationships between variables.
External data
• Data that exists outside of a system.
Contd.

Extract, transform, and load (ETL)
• A process used in data warehousing to prepare data for use in reporting or analytics.
Information management
• The practice of collecting, managing, and distributing information of all types–digital, paper-based, structured, unstructured.
• in-memory database
• Any database system that relies on memory for data storage.
• in-memory data grid (IMDG)
• The storage of data in memory across multiple servers for the purpose of greater scalability and faster access or analytics.
Internet of Things (IoT)
• The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each
thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure.
Location analytics
• Location analytics brings mapping and map-driven analytics to enterprise business systems and data warehouses. It allows you to associate geospatial information with datasets.
Location data
• Data that describes a geographic location.
Machine learning
• A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data.
Metadata
• Any data used to describe other data–for example, a data file’s size or date of creation.
Multidimensional database
• A type of database that stores data as multidimensional arrays, or “cubes,” as opposed to the rows and column sotrage structure of relational databases. This enables data to be analyzed from different angles for complex queries and analytical
processing (OLAP) applications.
Natural language processing
• The ability of a computer program or system to understand human language. Applications of natural language processing include enabling humans to interact with computers using speech, automated language translation, and deriving meaning
from unstructured data such as text or speech data.
NoSQL
• A class of database management system that does not use the relational model. NoSQL is designed to handle large data volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the
relational model.
Online analytical processing (OLAP)
• The process of analyzing multidimensional data using three operations: consolidation (the aggregation of available), drill-down (the ability for users to see the underlying details), and slice and dice (the ability for users to select subsets and view
them from different perspectives).
Contd.

Open source software
• Software with source code that is made available by the copyright holder free of charge to the general public. This code may be redistributed, and anyone can inspect and change it.
Pattern recognition
• The classification or labeling of an identified pattern in the machine learning process.
Petabyte
• One million gigabytes or 1,024 terabytes.
Predictive analytics
• Using statistical functions on one or more datasets to predict trends or future events.
Predictive modeling
• The process of developing a model that will most likely predict a trend or outcome.
Query analysis
• The process of analyzing a search query for the purpose of optimizing it for the best possible result.
R
• An open source software environment used for statistical computing.
Records management
• The process of managing an organization’s records throughout their entire lifecycle, from creation to disposal.
Risk analysis
• The application of statistical methods on one or more datasets to determine the likely risk of a project, action, or decision.
Root-cause analysis
• The process of determining the main cause of an event or problem.
Scalability
• The ability of a system or process to maintain acceptable performance levels as workload or scope increases.
Schema
• The structure that defines the organization of data in a database system.
Search
• The process of locating specific data or content using a search tool.
Contd.

Search data
• Aggregated data about search terms used over time.
Storage
• Any means of storing data persistently.
Structured data
• Data that is organized by a predetermined structure.
Structured Query Language (SQL)
• A programming language designed specifically to manage and retrieve data from a relational database system.
Terabyte
• 1,000 gigabytes.
Text analytics
• The application of statistical, linguistic, and machine learning techniques on text-based sources to derive meaning or insight.
Transactional data
• Data that changes unpredictably. Examples include accounts payable and receivable data, or data about product shipments.
Transparency
• As more data becomes openly available, the idea of proprietary data as a competitive advantage is diminished.
Unstructured data
• Data that has no identifiable structure – for example, the text of email messages.
Weather data
• Real-time weather data is now widely available for organizations to use in a variety of ways. For example, a logistics company can monitor local weather conditions to optimize the transport of goods. A utility company can adjust energy distribution
in real time.
Whole Earth Model
• An integrated data management system that allows geophysicists, engineers, and financial managers in the oil and gas industry evaluate the potential of oil and gas fields.
Contd.

Hector Guerrero- Road to Business Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hector Guerrero- Road to Business Analytics

Similar to Hector Guerrero- Road to Business Analytics (20)

Recently uploaded

Recently uploaded (20)

Hector Guerrero- Road to Business Analytics