UCSC – University of
• RELATIONAL SCHEMA
• STAR SCHEMA
• BIG DATA
• FAST DATA
• SCHEMA ON READ – SCHEMA ON WRITE
• In the relational model, every tuple must have a unique identification or key based on the data
• Foreign key
• Primary key
There are two intersection entities in this schema: student/course and employee/course. these
handle the two many-to-many relationships:
1) between student and course
2) between employee and course.
In the first case, a student may take many courses and a course may be taken by many students.
similarly, in the second case, an employee (one of the types of teachers) may teach many courses
and a course may be taught by many teachers
• In computing, the star schema is the simplest style of data mart schema.
• The star schema consists of one or more fact tables referencing any number of dimension tables.
• The star schema is an important special case of the snowflake schema, and is more effective for
handling simpler queries
• In data warehousing and business intelligence , a star schema is the simplest form of a dimensional
• Optimized for querying large data sets and are used in data warehouses and data marts to support
cubes, business intelligence and analytic applications, and ad hoc queries.
WHO USE BIG DATA?
FOR WHAT THEY USE BIG DATA?
• TO MEET THEIR BUSINESS/ STRATEGIC OBJECTIVES
WHAT IS BIG DATA
THE INFORMATION OWNED BY YOUR COMPANY, OBTAINED AND PROCESSED
THROUGH NEW TECHNIQUES TO PRODUCE VALUE IN THE BEST WAY POSSIBLE.
• It's a lot of data produced very quickly in many different forms. this could involve customer
transactional histories, production databases, web traffic logs, online videos, social media
WHAT IS UNIQUE ABOUT BIG DATA?
• It represents both significant information - which can open new doors - and the way this
information is analyzed to help open those doors.
• Combing the data to find value
• Make the best use of information to improve their business capabilities
CONCEPT – FROM DATA AT REST TO DATA IN MOTION
Measure volume in terms of time
• The number of megabytes per second
• Gigabytes per hour
• Terabytes per day
• Value of fast data :
• React to data instant is arrives
schema on write schema on read
when data is shared with people who are having
different interest on data, have to think about all of these
constituencies in advance and define a schema that has
something for everyone
can present data in a schema that is adapted best to the
queries being issued
have to do an extensive data modeling job and develop
and über-schema that covers all of the datasets that you
care about, have to think whether the schema will
handle the new data set
do not need to do such modelling exercises
can load your data as-is and start to get value from it
right away – important when dealing with structured,
unstructured, semi – structured data