SlideShare a Scribd company logo
1 of 42
Smart Home Technologies
Data Management and Databases
Databases for Smart Homes
 Requirements
 Database Types
 Database Technologies
 Smart Home Databases
 Data Mining
Data Storage Requirements
 Sensor data
 Temperature (15 @ 8 Kbps)
 Humidity (15 @ 8 Kbps)
 Gas (15 @ 8 Kbps)
 Light (15 @ 8 Kbps)
 Motion (15 @ 8 Kbps)
 Pressure (100 @ 8 Kbps)
 Microphone (15 @ 500 Kbps)
 Camera (15 @ 10 Mbps)
Data Storage Requirements
 User data
 Multimedia
 Phone messages/conversations (500 Kbps – 10 Mbps)
 Music (500 Kbps)
 TV/Radio broadcasts (500 Kbps – 10 Mbps)
 Home movies (10 Mbps)
 Images
 Computer
 Programs
 Data files
 Operating systems
Data Storage Issues
 Issues
 Query frequency and type
 Sampling/recording rates
 205 sensors (158,900 Kbps)
 Multimedia recordings
 Simultaneous playback
 Analysis, prediction, decision-making queries
 Transaction granularity
 Historical data, decay
 Security and privacy
 Centralized vs. distributed
What Data to Store
 Type of Data
 Raw data
 Pre-processed
 Compressed
 Frequency of Data Storage for
Sensor Data
 Tradeoff between precision and
quantity
Sensor Data Example
 9/8/2002 2:0:1 AM~A5 (Coffee Maker) ON
 9/8/2002 1:6:59 AM~A9 (A/C) ON
 9/8/2002 3:58:52 AM~A0 (Stereo) ON
 9/8/2002 5:57:0 AM~A2 (Kitchen Light) ON
 9/8/2002 3:1:42 AM~A5 (Coffee Maker) OFF
 9/8/2002 7:8:3 AM~A3 (Stove) ON
 9/8/2002 12:54:52 PM~A10 (Bathroom Light) ON
 9/8/2002 4:58:5 AM~A0 (Stereo) OFF
 9/8/2002 8:1:20 AM~A3 (Stove) OFF
 9/8/2002 9:6:10 AM~A8 (Computer) ON
 9/8/2002 10:8:19 AM~A4 (Bathtub Heater) ON
 9/8/2002 11:9:4 AM~A0 (Stereo) ON
 9/8/2002 9:4:5 AM~A8 (Computer) OFF
 9/8/2002 10:9:4 AM~A4 (Bathtub Heater) OFF
 9/8/2002 2:2:5 PM~A10 (Bathroom Light) OFF
 9/8/2002 2:52:37 PM~A0 (Stereo) OFF
 9/8/2002 4:2:0 PM~A9 (A/C) OFF
Media Viewing Example
Watching Events
Date Day Mood Start End Device Program name Type Comments Others Rating
020302 Su normal 1330 1600 T nba basketball sports
dallas mavericks go
team
none 5
020302 Su normal 1700 2100 t super bowl sports
gotta watch the
commercials
Dad 5
020402 m normal 1900 2000 t boston public drama hot teachers none 5
020402 m normal 2000 2100 t ally mcbeal drama funny lawyers none 4
020402 m normal 2300 100 V WWF RAW wrestling testosterone none 5
020502 t normal 2100 2200 t philly drama hot lawyers none 4
020602 w bored 1830 2200 t nba basketball sports GO MAVS none 5
020702 th tired 1900 2100 t
wwf
smackdown
wrestling its me soap none 5
020702 th tired 2100 2200 t ER drama good show none 4
020802 f excited 1900 2230 t olympics sports gotta watch none 4
020902 sa excited 1900 2230 t olympics sports gotta watch none 4
021002 su ecstatic 1500 1800 t
NBA allstar
game
sports
gotta see what
happens
none 3
012802 M normal 1900 2000 T Boston Public Drama hot chicks teaching none 5
012802 M normal 2000 2100 T Ally McBeal Drama hot chicks lawyering none 5
Multimedia Example
 Digital Silhouettes (Predictive Networks)
 Predicting web surfing behavior ($$$)
 Microsoft (2002) track TV viewing preferences
 140 data items for each user
 Demographics (50)
 Subcategories within gender, age, income,
education, occupation, and race
 90 Content preferences
 golf, music, yoga
Database Types / Data Models
 Relational
 OO
 Hybrid (Object-Relational)
 Temporal
 Deductive
 Others
 Spatial, …
Example Data Representations
 Relational
 We all know…flat tables of atomic attributes with
foreign key relationships
 OO
 Complex data reps
 multivalued, composite
 Temporal
 Relational model: add valid start, end dates to
each table (versions of info and when valid)
 Includes time, events, durations…
Operations
 DDL/DML (data def/manip languages)
 SQL
 OQL
 Update operations
 Built-in insert, delete, update
 Stored procedures for triggers, active
(ECA) rules
Example Operations for
Temporal Databases
 INCLUDES
 Rows valid in a certain time period
 BEFORE/AFTER a time condition
 Set operations
 Union, intersection of 2 time periods
Active DB
 Event-Condition-Action rules
 Allow for decisions to be made in the database
instead of a separate application
 Relational
 Implemented as triggers
 Challenges
 Rule consistency
 (2+ rules do not contradict)
 Guaranteed termination
 Trigger loops (T1 <->T2)
Smart Home Active DB Example
 Java, Postgres, Jess rules
 Event classification (local&composite)
 Data Manipulation Events
 TV show being viewed (channel, time, genre…)
 Temporal Events (instance,recurring)
 Set temp to 70 degrees at 7:00am workdays
 Exception Events
 Power failure
 Behavioral Events
 Time children home from school; dinner time
Active DB Example (TCU)
Title Event Condition Action
TV View
Menu
TV turned on Molly is holding
remote
Display shows
matching Molly’s
preferences
Entry
Lighting
Inhabitant enters
house
Light level
<threshold
Adjust lighting to
predetermined level
Aroma-
therapy
Every Friday
night when
Hanna sits on
sofa
Always Release aroma
Night Idle John on sofa idle
> 15 minutes,
TV&lights are on
No other
inhabitant in
room
Turn off all devices in
the room
Distributed vs. Centralized
 Centralized database can produce a
bottleneck
 Large volume of data input
 Large database
 Large volume of queries
 In distributed databases, data consistency,
replication, and retrieval can be more
problematic
 Consistency of schemas
 Retrieval in case the data location is not known
 Communication overhead to ensure database
consistency
SmartHome
Database Architecture
 Centralized vs. distributed?
 Answer: Both
 Central storage of high demand, persistent data
 Distributed storage of low demand, dynamic data
 Distributed queries
 Push processing toward sensors
 Adaptive, hierarchical organization
 End-effector autonomy (“smart sensor”)
Database Systems
 Commercial
 DB2
 Empress
 Informix
 Oracle
 MS Access
 MS SQL
 Sybase
 Free
 Berkeley DB
 PostgreSQL
 MySQL
UTA MavHome DB
 Active
 Reactive & proactive (e.g., to predict)
 Distributed
 Information collection agents
 Rules
 Local Agent: what data they need to collect
 Distributed: coordinate overall monitoring of collected
information
 Continuous monitoring of events
 Extension of SNOOP
Microsoft Easy Living DB
(2002)
 Relational
 Fast & robust, but awkward for some data
 World Model DB Describes:
 Computing devices
 People and their personal preferences/settings
 Services
 Rooms and doorways
 Serves as Abstraction Layer between sensors and
application that use data from sensors
 e.g. new sensors  no change to applications
Stanford Interactive
Workspace
 Uses LORE
 A semi-structured XML DB system
 Still available, but work stopped in 2000
 Data stored is catalog of (index to)
 documents, images, 3-D models, application-
specific domain models
Sensor Database Systems
 COUGAR project
 www.cs.cornell.edu/database/cougar
 Query processing over ad-hoc sensor
networks
 Small database component (QueryProxy) at
each sensor
 Sensor clusters provide local aggregations
(e.g., min, max, mean)
 Assumes centralized index of all data sources
Siemens Netabase
 “The network is the database.”
 Navas and Wynblatt, ACM SIGMOD 2001
 Sensor networks
 Large number of data sources (105)
 Volatile data and data organization
 “Thin” data servers on scaled-down hardware
 Netabase approach
 Query decomposition
 Characteristic routing (ala IP routing)
 Local joins
 Query evaluation
Siemens Netabase
 www.netabasesoftware.com
Data Warehouses
 Repositories for data mining activities
 Aggregates/summaries of data help efficiency
 Optimized for decision-support, not
transaction processing
 Definition (Elmasri, page 900)
 A subject-oriented, integrated, non-volatile, time-
variant collection of data in support of
management’s decisions”
 Replace “management”, with “smart home agents”
Warehouse Properties
 Very large: 100gigabytes to many terabytes
 Tends to include historical data
 Workload: mostly complex queries that access lots of data, and
do many scans, joins, aggregations. Tend to look for "the big
picture".
 Updates pumped to warehouse in batches (overnight)
 Data may be heavily summarized and/or consolidated in
advance (must be done in batches too, must finish overnight).
 Research work has been done (e.g. "materialized views") -- a small
piece of the problem.
02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Warehouses
 Data Cleaning
 Data Migration: simple transformation rules (replace "gender" with "sex")
 Data Scrubbing: use domain-specific knowledge (e.g. zip codes) to modify
data. Try parsing and fuzzy matching from multiple sources.
 Data Auditing: discover rules and relationships (or signal violations thereof).
Not unlike data mining.
 Data Loading
 can take a very long time! (Sorting, indexing, summarization, integrity
constraint checking, etc.) Parallelism a must.
 Full load: like one big xact – change from old data to new is atomic.
 Incremental loading ("refresh") makes sense for big warehouses, but
transaction model is more complex – have to break the load into lots of
transactions, and commit them periodically to avoid locking
everything. Need to be careful to keep metadata & indices consistent along
the way.
02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Warehouses
02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Mining Definition
 Discovery of new information in terms of patterns or
rules from vast amounts of data
 Extracts patterns that can’t readily be found by
asking the right questions (queries)
 TOO MUCH DATA FOR HUMANS
 Emerged from
 Artificial Intelligence:Machine learning, Neural nets, Genetic
Algorithms
 Statistics
 Operations Research
Data Mining Steps
 Data selection -- pick the data needed
 Data cleansing
 Fix bad data (e.g., spelling, zip codes)
 Hard to deal with missing, erroneous, conflicting, redundant
data
 Enrichment
 Add data (e.g., age, gender, income)
 Data transformation
 Aggregate (e.g., zip codes  regions)
 Data mining
 Reporting on discovered Knowledge
Types of Results
 Association rules
 Buy diapers  buy lots of beer
 Sequential patterns
 Buy house  buy furniture within months
 Classification trees
 Types of buyers (upscale,bargain-conscience, …)
 Why do it?
 Make more money
 Science & medicine
Data Mining Goals
 Find patterns to predict future events
 Find major groupings
 Groupings of buyers, stars, diseases …
 Find which group something belongs to
 creditworthiness
Data Mining Results
 Association rules
 Classification hierarchies
 Clustering
 Sequential patterns
 Patterns within time series
 Type of result, inputs & algorithms vary
 Often interested in some combination of
these types of Knowledge
Clustering
 Unsupervised learning techniques
 Training samples are unclassified
 Vs. supervised learning (classification)
 Drug categories for depression
 Categories of TV viewers
 Categories of buyers (likely, unlikely)
 Categories of households?
 Single male, mother/children, conventional
(M/D/kids), DINKs.
Sequential Patterns
 Detecting associations among events
with certain temporal relationships
 Example:
 Cardiac bypass for blocked arteries
 AND within 18 months, high blood urea
 THEN kidney failure likely in next 18
months
 Particularly important in smart homes
Sequential Pattern Discovery
 Sequence of itemsets
 Grocery store purchases by 1 person
(3 itemsets)
 {soy milk, bread, chocolate}, {bananas,
chocolate}, {lettuce, tomato, chocolate}
 2 Subsequences
 {soy milk, bread, chocolate}, {bananas, chocolate},
 {bananas, chocolate}, {lettuce, tomato, chocolate}
Sequential Pattern Discovery
 The support for a sequence S is the % of the given
set U of sequences of which S is a subsequence.
 That is: how many times does S show up?
 Find all subsequences from the given sequence sets
that have a user-defined minimum support.
 The sequence S1, S2, … Sn, is a predictor of “fact”
that a customer that buys itemset S1 is likely to buy
itemset S2, then S3, …
 Prediction support based on frequency of this
sequence in the past
 Many research issues to create good algos
Patterns Within Time Series
 Finding 2 patterns that occur over time
 2003 stock prices of Choice Homes and
Home Depot
 2 products show same sales pattern in
summer but different one in winter
 Solar magnetic wind patterns may predict
earth atmospheric changes
Time Series Pattern Discovery
 Time series are sequences of events
 Event could be a transaction (closing daily
stock price)
 Look at sequences over n days, or
 Longest period in which change is no
greater than 1%
 Comparing
 Must define similarity measures
Other Approaches in Data Mining
 Neural nets
 Infer a function from a set of examples
 Non-parametric curve-fitting
 Interpolates to solve new problems
 Supervised & unsupervised algorithms
 Capabilities
 classification
 time-series prediction
 Disadvantages
 can’t see what it learned (not declarative)
Other Approaches in Data Mining
 Genetic algorithms
 Set up
 Representation (strings over an alphabet)
 Evaluation (fitness) function
 Parameters: # of generations, cross-over rate,
mutation rate, etc.
 Randomized (probabilistic operators),
parallel search over search space
 Used for problem solving and clustering

More Related Content

Similar to Databases.ppt

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web Systems
Vineet Gupta
 
Internship_Presentation
Internship_PresentationInternship_Presentation
Internship_Presentation
Sourabh Gujar
 

Similar to Databases.ppt (20)

Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
 
Case In Point
Case In PointCase In Point
Case In Point
 
Sharing data: Sanger Experiences
Sharing data: Sanger ExperiencesSharing data: Sanger Experiences
Sharing data: Sanger Experiences
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Sound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT InternetSound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT Internet
 
DEVNET-1163 Data in Motion APIs
DEVNET-1163	Data in Motion APIsDEVNET-1163	Data in Motion APIs
DEVNET-1163 Data in Motion APIs
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
data analytics.pptx
data analytics.pptxdata analytics.pptx
data analytics.pptx
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web Systems
 
Internship_Presentation
Internship_PresentationInternship_Presentation
Internship_Presentation
 

Recently uploaded

Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Stephen266013
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 

Recently uploaded (20)

Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 

Databases.ppt

  • 1. Smart Home Technologies Data Management and Databases
  • 2. Databases for Smart Homes  Requirements  Database Types  Database Technologies  Smart Home Databases  Data Mining
  • 3. Data Storage Requirements  Sensor data  Temperature (15 @ 8 Kbps)  Humidity (15 @ 8 Kbps)  Gas (15 @ 8 Kbps)  Light (15 @ 8 Kbps)  Motion (15 @ 8 Kbps)  Pressure (100 @ 8 Kbps)  Microphone (15 @ 500 Kbps)  Camera (15 @ 10 Mbps)
  • 4. Data Storage Requirements  User data  Multimedia  Phone messages/conversations (500 Kbps – 10 Mbps)  Music (500 Kbps)  TV/Radio broadcasts (500 Kbps – 10 Mbps)  Home movies (10 Mbps)  Images  Computer  Programs  Data files  Operating systems
  • 5. Data Storage Issues  Issues  Query frequency and type  Sampling/recording rates  205 sensors (158,900 Kbps)  Multimedia recordings  Simultaneous playback  Analysis, prediction, decision-making queries  Transaction granularity  Historical data, decay  Security and privacy  Centralized vs. distributed
  • 6. What Data to Store  Type of Data  Raw data  Pre-processed  Compressed  Frequency of Data Storage for Sensor Data  Tradeoff between precision and quantity
  • 7. Sensor Data Example  9/8/2002 2:0:1 AM~A5 (Coffee Maker) ON  9/8/2002 1:6:59 AM~A9 (A/C) ON  9/8/2002 3:58:52 AM~A0 (Stereo) ON  9/8/2002 5:57:0 AM~A2 (Kitchen Light) ON  9/8/2002 3:1:42 AM~A5 (Coffee Maker) OFF  9/8/2002 7:8:3 AM~A3 (Stove) ON  9/8/2002 12:54:52 PM~A10 (Bathroom Light) ON  9/8/2002 4:58:5 AM~A0 (Stereo) OFF  9/8/2002 8:1:20 AM~A3 (Stove) OFF  9/8/2002 9:6:10 AM~A8 (Computer) ON  9/8/2002 10:8:19 AM~A4 (Bathtub Heater) ON  9/8/2002 11:9:4 AM~A0 (Stereo) ON  9/8/2002 9:4:5 AM~A8 (Computer) OFF  9/8/2002 10:9:4 AM~A4 (Bathtub Heater) OFF  9/8/2002 2:2:5 PM~A10 (Bathroom Light) OFF  9/8/2002 2:52:37 PM~A0 (Stereo) OFF  9/8/2002 4:2:0 PM~A9 (A/C) OFF
  • 8. Media Viewing Example Watching Events Date Day Mood Start End Device Program name Type Comments Others Rating 020302 Su normal 1330 1600 T nba basketball sports dallas mavericks go team none 5 020302 Su normal 1700 2100 t super bowl sports gotta watch the commercials Dad 5 020402 m normal 1900 2000 t boston public drama hot teachers none 5 020402 m normal 2000 2100 t ally mcbeal drama funny lawyers none 4 020402 m normal 2300 100 V WWF RAW wrestling testosterone none 5 020502 t normal 2100 2200 t philly drama hot lawyers none 4 020602 w bored 1830 2200 t nba basketball sports GO MAVS none 5 020702 th tired 1900 2100 t wwf smackdown wrestling its me soap none 5 020702 th tired 2100 2200 t ER drama good show none 4 020802 f excited 1900 2230 t olympics sports gotta watch none 4 020902 sa excited 1900 2230 t olympics sports gotta watch none 4 021002 su ecstatic 1500 1800 t NBA allstar game sports gotta see what happens none 3 012802 M normal 1900 2000 T Boston Public Drama hot chicks teaching none 5 012802 M normal 2000 2100 T Ally McBeal Drama hot chicks lawyering none 5
  • 9. Multimedia Example  Digital Silhouettes (Predictive Networks)  Predicting web surfing behavior ($$$)  Microsoft (2002) track TV viewing preferences  140 data items for each user  Demographics (50)  Subcategories within gender, age, income, education, occupation, and race  90 Content preferences  golf, music, yoga
  • 10. Database Types / Data Models  Relational  OO  Hybrid (Object-Relational)  Temporal  Deductive  Others  Spatial, …
  • 11. Example Data Representations  Relational  We all know…flat tables of atomic attributes with foreign key relationships  OO  Complex data reps  multivalued, composite  Temporal  Relational model: add valid start, end dates to each table (versions of info and when valid)  Includes time, events, durations…
  • 12. Operations  DDL/DML (data def/manip languages)  SQL  OQL  Update operations  Built-in insert, delete, update  Stored procedures for triggers, active (ECA) rules
  • 13. Example Operations for Temporal Databases  INCLUDES  Rows valid in a certain time period  BEFORE/AFTER a time condition  Set operations  Union, intersection of 2 time periods
  • 14. Active DB  Event-Condition-Action rules  Allow for decisions to be made in the database instead of a separate application  Relational  Implemented as triggers  Challenges  Rule consistency  (2+ rules do not contradict)  Guaranteed termination  Trigger loops (T1 <->T2)
  • 15. Smart Home Active DB Example  Java, Postgres, Jess rules  Event classification (local&composite)  Data Manipulation Events  TV show being viewed (channel, time, genre…)  Temporal Events (instance,recurring)  Set temp to 70 degrees at 7:00am workdays  Exception Events  Power failure  Behavioral Events  Time children home from school; dinner time
  • 16. Active DB Example (TCU) Title Event Condition Action TV View Menu TV turned on Molly is holding remote Display shows matching Molly’s preferences Entry Lighting Inhabitant enters house Light level <threshold Adjust lighting to predetermined level Aroma- therapy Every Friday night when Hanna sits on sofa Always Release aroma Night Idle John on sofa idle > 15 minutes, TV&lights are on No other inhabitant in room Turn off all devices in the room
  • 17. Distributed vs. Centralized  Centralized database can produce a bottleneck  Large volume of data input  Large database  Large volume of queries  In distributed databases, data consistency, replication, and retrieval can be more problematic  Consistency of schemas  Retrieval in case the data location is not known  Communication overhead to ensure database consistency
  • 18. SmartHome Database Architecture  Centralized vs. distributed?  Answer: Both  Central storage of high demand, persistent data  Distributed storage of low demand, dynamic data  Distributed queries  Push processing toward sensors  Adaptive, hierarchical organization  End-effector autonomy (“smart sensor”)
  • 19. Database Systems  Commercial  DB2  Empress  Informix  Oracle  MS Access  MS SQL  Sybase  Free  Berkeley DB  PostgreSQL  MySQL
  • 20. UTA MavHome DB  Active  Reactive & proactive (e.g., to predict)  Distributed  Information collection agents  Rules  Local Agent: what data they need to collect  Distributed: coordinate overall monitoring of collected information  Continuous monitoring of events  Extension of SNOOP
  • 21. Microsoft Easy Living DB (2002)  Relational  Fast & robust, but awkward for some data  World Model DB Describes:  Computing devices  People and their personal preferences/settings  Services  Rooms and doorways  Serves as Abstraction Layer between sensors and application that use data from sensors  e.g. new sensors  no change to applications
  • 22. Stanford Interactive Workspace  Uses LORE  A semi-structured XML DB system  Still available, but work stopped in 2000  Data stored is catalog of (index to)  documents, images, 3-D models, application- specific domain models
  • 23. Sensor Database Systems  COUGAR project  www.cs.cornell.edu/database/cougar  Query processing over ad-hoc sensor networks  Small database component (QueryProxy) at each sensor  Sensor clusters provide local aggregations (e.g., min, max, mean)  Assumes centralized index of all data sources
  • 24. Siemens Netabase  “The network is the database.”  Navas and Wynblatt, ACM SIGMOD 2001  Sensor networks  Large number of data sources (105)  Volatile data and data organization  “Thin” data servers on scaled-down hardware  Netabase approach  Query decomposition  Characteristic routing (ala IP routing)  Local joins  Query evaluation
  • 26. Data Warehouses  Repositories for data mining activities  Aggregates/summaries of data help efficiency  Optimized for decision-support, not transaction processing  Definition (Elmasri, page 900)  A subject-oriented, integrated, non-volatile, time- variant collection of data in support of management’s decisions”  Replace “management”, with “smart home agents”
  • 27. Warehouse Properties  Very large: 100gigabytes to many terabytes  Tends to include historical data  Workload: mostly complex queries that access lots of data, and do many scans, joins, aggregations. Tend to look for "the big picture".  Updates pumped to warehouse in batches (overnight)  Data may be heavily summarized and/or consolidated in advance (must be done in batches too, must finish overnight).  Research work has been done (e.g. "materialized views") -- a small piece of the problem. 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
  • 28. Data Warehouses  Data Cleaning  Data Migration: simple transformation rules (replace "gender" with "sex")  Data Scrubbing: use domain-specific knowledge (e.g. zip codes) to modify data. Try parsing and fuzzy matching from multiple sources.  Data Auditing: discover rules and relationships (or signal violations thereof). Not unlike data mining.  Data Loading  can take a very long time! (Sorting, indexing, summarization, integrity constraint checking, etc.) Parallelism a must.  Full load: like one big xact – change from old data to new is atomic.  Incremental loading ("refresh") makes sense for big warehouses, but transaction model is more complex – have to break the load into lots of transactions, and commit them periodically to avoid locking everything. Need to be careful to keep metadata & indices consistent along the way. 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
  • 29. Data Warehouses 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
  • 30. Data Mining Definition  Discovery of new information in terms of patterns or rules from vast amounts of data  Extracts patterns that can’t readily be found by asking the right questions (queries)  TOO MUCH DATA FOR HUMANS  Emerged from  Artificial Intelligence:Machine learning, Neural nets, Genetic Algorithms  Statistics  Operations Research
  • 31. Data Mining Steps  Data selection -- pick the data needed  Data cleansing  Fix bad data (e.g., spelling, zip codes)  Hard to deal with missing, erroneous, conflicting, redundant data  Enrichment  Add data (e.g., age, gender, income)  Data transformation  Aggregate (e.g., zip codes  regions)  Data mining  Reporting on discovered Knowledge
  • 32. Types of Results  Association rules  Buy diapers  buy lots of beer  Sequential patterns  Buy house  buy furniture within months  Classification trees  Types of buyers (upscale,bargain-conscience, …)  Why do it?  Make more money  Science & medicine
  • 33. Data Mining Goals  Find patterns to predict future events  Find major groupings  Groupings of buyers, stars, diseases …  Find which group something belongs to  creditworthiness
  • 34. Data Mining Results  Association rules  Classification hierarchies  Clustering  Sequential patterns  Patterns within time series  Type of result, inputs & algorithms vary  Often interested in some combination of these types of Knowledge
  • 35. Clustering  Unsupervised learning techniques  Training samples are unclassified  Vs. supervised learning (classification)  Drug categories for depression  Categories of TV viewers  Categories of buyers (likely, unlikely)  Categories of households?  Single male, mother/children, conventional (M/D/kids), DINKs.
  • 36. Sequential Patterns  Detecting associations among events with certain temporal relationships  Example:  Cardiac bypass for blocked arteries  AND within 18 months, high blood urea  THEN kidney failure likely in next 18 months  Particularly important in smart homes
  • 37. Sequential Pattern Discovery  Sequence of itemsets  Grocery store purchases by 1 person (3 itemsets)  {soy milk, bread, chocolate}, {bananas, chocolate}, {lettuce, tomato, chocolate}  2 Subsequences  {soy milk, bread, chocolate}, {bananas, chocolate},  {bananas, chocolate}, {lettuce, tomato, chocolate}
  • 38. Sequential Pattern Discovery  The support for a sequence S is the % of the given set U of sequences of which S is a subsequence.  That is: how many times does S show up?  Find all subsequences from the given sequence sets that have a user-defined minimum support.  The sequence S1, S2, … Sn, is a predictor of “fact” that a customer that buys itemset S1 is likely to buy itemset S2, then S3, …  Prediction support based on frequency of this sequence in the past  Many research issues to create good algos
  • 39. Patterns Within Time Series  Finding 2 patterns that occur over time  2003 stock prices of Choice Homes and Home Depot  2 products show same sales pattern in summer but different one in winter  Solar magnetic wind patterns may predict earth atmospheric changes
  • 40. Time Series Pattern Discovery  Time series are sequences of events  Event could be a transaction (closing daily stock price)  Look at sequences over n days, or  Longest period in which change is no greater than 1%  Comparing  Must define similarity measures
  • 41. Other Approaches in Data Mining  Neural nets  Infer a function from a set of examples  Non-parametric curve-fitting  Interpolates to solve new problems  Supervised & unsupervised algorithms  Capabilities  classification  time-series prediction  Disadvantages  can’t see what it learned (not declarative)
  • 42. Other Approaches in Data Mining  Genetic algorithms  Set up  Representation (strings over an alphabet)  Evaluation (fitness) function  Parameters: # of generations, cross-over rate, mutation rate, etc.  Randomized (probabilistic operators), parallel search over search space  Used for problem solving and clustering