BY: TEAM 7
Chintan Koticha(001267049)
Payal Dodeja (001224158)
Siddhant Chandiwal (001286480)
The Internet Movie Database(IMDB) is an online database of information related to
movies,TV shows,celebrities,genre,reviews,etc.The IMDB website enables registered users to
rate different movies,TVshows and actors on a scale of 1 to 10. It also enables users to search
different movies or TV shows of different genres on a single platform.
Prime focus of this project is to provide a generic functional database
of IMDB to access entertainment industry websites online.
For this purpose, we have gathered data from IMDB website and
have performed different SQL queries keeping in mind the user’s
perspective and expectations.
ADDRESS(AddressId,StreetLine,ZipCode,CityID)
AWARDCATEGORY(AwardId,AwardName,AwardTypeId)
AWARDTYPE(AwardTypeId,AwardTypeName,Date,Description,Location)
BOXOFFICE(BoxOfficeID, Budget,OpeningWeekend,Grossincome,MovieId)
CELEBRITY(CelebrityId ,CelebrityName,DateOfBirth,PlaceOfBirth,ShortBiograpgy,Gender)
CHANNEL(ChannelId,ChannelName)
COMPANYCREDITS(CompanyCreditId,ProductionCompany,MovieID)
DIRECTORS(DirectorID, DirectorName,ShortBiography,DateOfBirth,PlaceOfBirth)
EPISODE(EpisodeID, EpisodeNumber,EpisodeName,Description,SeasonID)
GENRE(GenreID, GenreName)
MOVIE(MovieID,MovieName,MovieShortDescription,ReleaseDate,MovieDuration,TelevisionContentRatingSystem,W
atchTrailerURL,
MovieRating,MovieTotalVotes,ReviewID)
POLL(PollID,PollName,PollDescription,URL,FeaturedPoll,HOtlyContestedPoll,PollType,ThumbnailImage,DirectorId,
CelebrityID,
TvShowID,MovieID,UserID)
USERACCOUNT(UserID,FirstName,LastName,Email,Password,CityID)
Predicting Movie Rating and its Votes
Total Income Generated from Movies from each State and
each Country
Function to find most Networked Celebrity
Avg income of movies on the basis of genre
Total Income Generated from Movies from
each State and each Country
TV Show on Tonight
List of all Tables generated in Hive
The model depicts gross collection and rating of the upcoming movies and the basis of popularity of
celebrity, director and previous movie reviews.
 TRANSACTION CONTROL: A transaction can be defined as a group of operations or tasks that should be treated
as a single unit. MS SQL Server will execute and commit each command/task individually, and it will be difficult or
impossible to roll back changes if any errors are encountered along the way. To properly group statements, the
“BEGIN TRANSACTION” command is used to declare the beginning of a transaction, and either a COMMIT
statement is used at the end. In Oracle, each new database connection is treated as new transaction. As queries are
executed and commands are issued, changes are made only in memory and nothing is committed until an explicit
COMMIT statement is given.
 ORGANIZATION OF DATABASES: MS SQL Server organizes all objects, such as tables, views, and procedures, by
database names. Users are assigned to a login which is granted accesses to the specific database and its objects
whereas in ORACLE, all the database objects are grouped by schemas, which are a subset collection of database
objects and all the database objects are shared among all schemas and users.
 REVERSE ENGG OF DATABASE: Tools like TOAD are used to reverse engineer a database in ORACLE while
MSSQL has an inbuilt functionality for the same.
 WORKING WITH TRIGGERS: MS SQL has a set based approach. Rows that are affected by a data modification
(insert, update, delete) are stored in the inserted and deleted tables. In Oracle there are before and after triggers
and a trigger can be defined to be executed per row or per statement. Oracle disallows access to other rows of the
table as conceptually, the per row trigger is fired during the table modification for each row when the row is in the
process of being modified.
 WORKING WITH BLOB: ORACLE stores image URL’s as datatype BLOB and while retrieving it represents as
(BLOB) whereas MSSQL datatype VARBINARY(MAX) and while retrieving data it represents as a set of binary
characters.
 SYNTACTICAL DIFFERENCES BETWEEN THE 2.
 Created and design IMDB database by checking the current version of imdb.com and built an ER
model with respect to it
 Translated the ER model to normalized tables
 Populated the real time data, analysed and extracted raw data from the imdb website
 After incorporation of data in the databases (MsSQL, Oracle, Hive), the next step was to design an
analysis model like estimating the gross collection of the movie on the basis of previous movie
rating, celebrities, etc.
 Analysed few general user questions on Tableau Software which provided Amazing Data
Visualization and helps in faster data interpretation
 The applications of the model are many and are not limited to the above sample only.
Final PPT Imdb (1)

Final PPT Imdb (1)

  • 1.
    BY: TEAM 7 ChintanKoticha(001267049) Payal Dodeja (001224158) Siddhant Chandiwal (001286480)
  • 2.
    The Internet MovieDatabase(IMDB) is an online database of information related to movies,TV shows,celebrities,genre,reviews,etc.The IMDB website enables registered users to rate different movies,TVshows and actors on a scale of 1 to 10. It also enables users to search different movies or TV shows of different genres on a single platform.
  • 3.
    Prime focus ofthis project is to provide a generic functional database of IMDB to access entertainment industry websites online. For this purpose, we have gathered data from IMDB website and have performed different SQL queries keeping in mind the user’s perspective and expectations.
  • 4.
    ADDRESS(AddressId,StreetLine,ZipCode,CityID) AWARDCATEGORY(AwardId,AwardName,AwardTypeId) AWARDTYPE(AwardTypeId,AwardTypeName,Date,Description,Location) BOXOFFICE(BoxOfficeID, Budget,OpeningWeekend,Grossincome,MovieId) CELEBRITY(CelebrityId ,CelebrityName,DateOfBirth,PlaceOfBirth,ShortBiograpgy,Gender) CHANNEL(ChannelId,ChannelName) COMPANYCREDITS(CompanyCreditId,ProductionCompany,MovieID) DIRECTORS(DirectorID,DirectorName,ShortBiography,DateOfBirth,PlaceOfBirth) EPISODE(EpisodeID, EpisodeNumber,EpisodeName,Description,SeasonID) GENRE(GenreID, GenreName) MOVIE(MovieID,MovieName,MovieShortDescription,ReleaseDate,MovieDuration,TelevisionContentRatingSystem,W atchTrailerURL, MovieRating,MovieTotalVotes,ReviewID) POLL(PollID,PollName,PollDescription,URL,FeaturedPoll,HOtlyContestedPoll,PollType,ThumbnailImage,DirectorId, CelebrityID, TvShowID,MovieID,UserID) USERACCOUNT(UserID,FirstName,LastName,Email,Password,CityID)
  • 5.
  • 6.
    Total Income Generatedfrom Movies from each State and each Country
  • 7.
    Function to findmost Networked Celebrity
  • 8.
    Avg income ofmovies on the basis of genre
  • 9.
    Total Income Generatedfrom Movies from each State and each Country
  • 10.
    TV Show onTonight
  • 11.
    List of allTables generated in Hive
  • 16.
    The model depictsgross collection and rating of the upcoming movies and the basis of popularity of celebrity, director and previous movie reviews.
  • 18.
     TRANSACTION CONTROL:A transaction can be defined as a group of operations or tasks that should be treated as a single unit. MS SQL Server will execute and commit each command/task individually, and it will be difficult or impossible to roll back changes if any errors are encountered along the way. To properly group statements, the “BEGIN TRANSACTION” command is used to declare the beginning of a transaction, and either a COMMIT statement is used at the end. In Oracle, each new database connection is treated as new transaction. As queries are executed and commands are issued, changes are made only in memory and nothing is committed until an explicit COMMIT statement is given.  ORGANIZATION OF DATABASES: MS SQL Server organizes all objects, such as tables, views, and procedures, by database names. Users are assigned to a login which is granted accesses to the specific database and its objects whereas in ORACLE, all the database objects are grouped by schemas, which are a subset collection of database objects and all the database objects are shared among all schemas and users.  REVERSE ENGG OF DATABASE: Tools like TOAD are used to reverse engineer a database in ORACLE while MSSQL has an inbuilt functionality for the same.  WORKING WITH TRIGGERS: MS SQL has a set based approach. Rows that are affected by a data modification (insert, update, delete) are stored in the inserted and deleted tables. In Oracle there are before and after triggers and a trigger can be defined to be executed per row or per statement. Oracle disallows access to other rows of the table as conceptually, the per row trigger is fired during the table modification for each row when the row is in the process of being modified.  WORKING WITH BLOB: ORACLE stores image URL’s as datatype BLOB and while retrieving it represents as (BLOB) whereas MSSQL datatype VARBINARY(MAX) and while retrieving data it represents as a set of binary characters.  SYNTACTICAL DIFFERENCES BETWEEN THE 2.
  • 19.
     Created anddesign IMDB database by checking the current version of imdb.com and built an ER model with respect to it  Translated the ER model to normalized tables  Populated the real time data, analysed and extracted raw data from the imdb website  After incorporation of data in the databases (MsSQL, Oracle, Hive), the next step was to design an analysis model like estimating the gross collection of the movie on the basis of previous movie rating, celebrities, etc.  Analysed few general user questions on Tableau Software which provided Amazing Data Visualization and helps in faster data interpretation  The applications of the model are many and are not limited to the above sample only.