Impacto del Big Data en la empresa española


Published on

Dirigida a directivos y analistas de mediana y gran empresa, Big Data Spain celebró una charla previa a la conferencia de la segunda edición del 7y 8 de noviembre del 2013.

Vídeo youtube:
¿Quieres saber más?

Oscar Méndez, co-fundador de y, habló de Big Data desde un punto de vista de negocio, y despejó dudas acerca del coste y recursos necesarios para aprovechar esta tecnología.

Las plataformas v2.0 post-Hadoop permiten el despligue rápido y simple de herramientas integradas de data mining, data processing, data analysis y data visualization. Los avances de los últimos 12 meses dejan atrás las limitaciones de sistemas de Business Intelligence tradicionales.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hilo de la presentación:TESIS----------Aparación de Big Data 2.0 (cambioedparadigma Big Query)Requerimientos: 100XNecesidad de arquitectura NO-HADOOP paraconseguirestosrequerimientosOPORTUNIDAD------------------------Dado quees la únicaplataforma NO-HADOOP open source, si la tesisescorrectaserá:The Open Source Big Data 2.0 Platform
  • A technological Change from Big Data 1.0 to Big Data 2.0, from Batchanalysis 12 years old technology Batch analysis, to interactive analysisstate of the art.Este proyecto se basa en la tesis de que se estáproduciendo un cambiotecnológico en el mundo de Big Data, querequiere un mayor rendimientocon capacidades de analisisinteractivo y capacidades de queries entiempo real. Se requiere un rendimiento 100X superior paraconvertir enunospocosminutoslashorasque se necesitaban con lastecnologíasanteriores.Para conseguirestascapacidadesesnecesarioabandonarhadoop, cuyaarquitecuraestálimitadaporconceptos con 12 años de antiguedad, comosunecesidad y dependendia de la persistencia en disco, y escrituras nooptimizads, que no permitiráalcanzar los requerimientos de 100XPerformace.En lugar de sin seguir con retraso los pasosya dados porotros, Stratiodesarrolla y proporciona la únicaplataforma Big Data open source nobasada en hadoop, creando y definiendonuevosparadigmas y posibilidadesquehanpermitidorealizarunaarquitecturaintegradaúnicatotalmenteconcebidapara el máximorendimiento 100X requeridoactualmente,adaptable, y sin vendor lock-in.
  • Impacto del Big Data en la empresa española

    2. 2. Big Data Is it a real need or just trendy? Why does it apply to my case?
    3. 3. Petabytes: Google 300 PB, facebook: 45 PB, Yahoo! 180 PB Exabytes: U.S. healthcare Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB YottaByte, Brontobyte, GeopByte to be reached I do not have such a big volume of data A big European company = Terabytes
    4. 4. But could or will have it: Ever increasing amount of data, and more heterogeneous: Ubiquity, mobility, geolocation, social networks, internet, sensors, M2M CRMs, Call Centers, Emails, Documents, logs, voice…
    5. 5. "There were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days." Google Ceo Eric Schmidt
    6. 6. Unstructured or semi structured data, equal to 85% of available data, is not used by companies This represent the new Fuel for companies
    7. 7. 83% of the surveyed companies were able to do things with Big Data that seemed impossible to achieve before “The art of possible” “Impossible is not a fact, it’s an opinion”
    8. 8. Value and real ROI are the best KPIs •Increase of client acquisitions • Increase in sales • Resource optimization • Customer loyalty
    9. 9. You can’t stay stuck in old paradigms
    10. 10. When to use it?
    11. 11. Extract value from data in any point of their life cycle • Past: Stored data, Batch mode • Present: Current data flows, Real time • Future: Data and future actions, Predictive
    12. 12. Big volume of data Get value from Unstructured data Get value from external data Need for time or cost processsing reduction Need for Data streaming analysis in real time Algorithms, prediction or interactive analysis Transform data into insights and value Transformation to a Data driven company
    13. 13. Customer Pain
    14. 14. “I know I have to change to Big Data but…” How do I start to use with? When? Which technology? How do I acquire the knowledge?
    15. 15. How to use it?
    16. 16. Iterative and Cyclical Choose a particular use case with a clear ROI and time and budget limits vs Big Bang Avoid building a Big Data generic system and then implementing projects over them
    17. 17. Which Technology?
    18. 18. A Technological Change From Big Data 1.0 Bigtable To Big Data 2.0 12 YEARS GAP Big query F1
    19. 19. CUSTOMER SOLUTION Big Data 2.0 ∙ Up to 100x faster than Big Data 1.0 ∙ Interactive analysis ∙ NoSQL with SQL Interface ∙ No need to change previous way of work
    20. 20. Which technology? BIG DATA 2.0 Stratio Cloudera Impala Cloudera CDH4* BIG DATA 1.0 NoSQL Stream Processing Hortonworks HDP* EMC Pivotal HD VoltDB Storm Microsoft HDInsight C-Store Apache HBASE MapR Apache Drill Espresso Apache CouchDB Scribe Aurora SQLStream Platform Cassandra FS Apache HDFS Open Source Google Big Query IBM Inphosphere Biginsight Datastax Platform Hadapt platform Basho Riak VMWare Redis HP Vertica Hstreaming Platform Apache Giraph Amazon EMR _& Red shift MapR M3-M5-M7 EMC Greenplum Voldemort Apache S4 Apache Flume Kafka NEO Techonology Neo4j* Almacenamiento Intel Hadoop Mencache EsperTech ESPER Graph database Hortonworks Stinger StreamBase Platform IBM Inphosphere Streams FlockDB EMC Isilon OnFS Closed based on Open Source Closed Apache Cassandra
    21. 21. From Big Data 1.0 Batch of new technologies that allow us to extract value out of a dataset which, due to it’s volume, variety or velocity, was not previously exploited To Big Data 2.0 “Set of new technologies that extract value from all the available data of a company”
    22. 22. Use Cases
    23. 23. The Bubble filter
    24. 24. You must enter in the user bubble
    25. 25. Antena 3, nubeox : Big Data Recommendation engine Monitoring of Streaming Videos Description: Recommendation Engine based not only in the purchase history of the customer, but also in their navigation Advantages: Increase in clickthrough Increasing Conversions Increase in sales
    26. 26. Customizing Web Sites: Behavioural Customization Description: Customizing homepages based on user navigation Analysis and customization of the homepage and site in real time for each user based on their browsing Modification of contents, highlights, ads, in real time based on user history Advantages: Over 300% increase in clickthrough Creating millions of web pages in real time Increasing Conversions Increase in sales Cost ten times lower than other solutions Recommended links News Interests Top Searches +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected
    27. 27. Personalized Marketing with DataShake integration Description: Newsletter development, email-marketing or any other sent material segmented by individual preferences Analyzes and takes into account: • Financial information and user data • Navigation and usage information from previous marketing shipments • Mobile app data (GPS, payments, browsing of offers…) • Users’ information from the social networks Advantages: Increased clickthrough Increase in conversions and sales Natural language processing – semantics and sentiments Combines private and public data
    28. 28. Complement private structured data with unstructured and public data Description: Complementing the internal data of a company by combining the structured and the unstructured data, with the data generated by the web and social networks, allows us to determine the validity of the data of our brand, product or company. The comparison and analysis of internal and external data (web) increases the value of our data and allows us to gain a competitive advantage over our competitors. Advantages:  It allows sales improvement. Improves loyalty. Increases Conversions. Detects errors or data manipulation.  SEO improvement with regards to the users and the public data. Improves marketing and product boosting with regards to trends. Big Data Page 32
    29. 29. BI and data analytics Description: Creation and/or complementation of BI systems and data analytics ETL tools and data uploading with a much higher volume than the traditional ones Capacity for analysis and visualization of all types of data, including graphs and new data types Advantages: Ability to work with larger datasets without the need to add or delete Much faster and reliable systems Massive reduction in cost (M € versus k €) Natural language processing – semantics and sentiments A possibility to combine internal data with external data (private and public data)
    30. 30. Telefónica Dynamic Insights (Smart Steps) Description: Collect mobile data, anonymised and aggregated, to understand how segments of the population collectively behave. Trace trends and the behaviours of crowds, not individuals. Use this insight to enlighten the space between organisations and their users, enabling them to improve their propositions, and businesses. Focus: By being able to measure real behaviour, in near real-time, 24/7, 365 days a year, we can show the actual impact on society, therefore enabling businesses and local government to make better decisions.
    31. 31. Security and fraud detection Description: Analysis of large volumes of data, logs, security systems, transactional systems Faster correlation mechanisms and machine learning algorithms allow early detection of attacks and security risks with extra care to false positives Internal fraud detection analyzing data and events from applications and risk operations Advantages: Combines data from transactional systems with the SIEM to help fight fraud Tracks and identifies new fraud methods and trends via user reviews Fraud detection techniques specified through the use of built-in patterns Much larger data volumes and much higher velocity Combines private and public data
    32. 32. M2M IoT: PARK AIR SYSTEMS NORWAY (RMMS) Description: The Remote Maintenance & Monitoring System (RMMS), provide a powerful, scalable and flexible SCADA system to perform and wide range of tasks required by CNS agents such as maintenance, supervision, configuration and operation. Integration of different systems and equipment shall be possible and straightforward using open standard protocols, real time monitoring, data storage, testing, reporting, events notification,… Focus: The main task of the RMMS is to provide complete access to the equipment supervised in order to monitor every single available parameter as a mean of avoiding personnel mobilization to the remote location. Different levels of control over the system are also provided to cover the requirements of supervision, maintenance and control. Five main elements compose the RMM system: • RCSU: Remote Control and Status Unit. • TP: Tower Panel. • RMM: Remote Management & Monitoring. • LMT/RMT: Local / Remote Management Terminal. • CMMS: Central Management & Monitoring System.
    33. 33. Search Engines Description: Big Data Search Assist: Search engines optimized for Big Data with self-learning improvements based on use Search engines for websites, intranets, apps With instant real-time search, single box with natural language processing, suggestions, highlighting, automatic corrections, “you wanted to say” tips, etc ... Advantages: Easy management for business users: Order of results, filters, etc ... Advanced features of the search engines with a cost ten times lower than other solutions Improved performance and scalability compared to other solutions Easy to integrate and use
    34. 34. ORM and social dialogue Description: It gives a full 360 º of a company or brand online, showing a tool that integrates the three aspects that define your actual online image: How am I doing on social networks?: Do I know how to usevfacebook, twitter, google +, youtube, linkedin? How many followers do you have, are you an influencer, do you generate content that spreads out? What is my presence and reputation on the Internet: When it comes to me, how do people talk about me, what is said, how does it evolve over time, what is my position on the Internet regarding my competitors in the different aspects that interest me. SEO: Simple and practical analysis of both internal SEO and external SEO to complement and give an integrated view of the above aspects of reputation and social dialogue. Advantages: Real improvement of the company or the product by analysing the evolution over time of the three major aspects that define your online reputation. It improves the negative aspects, and reinforce the positive ones. Increase in sales: Helps optimize and follow marketing campaigns and improve sales. Improving conversions and attracting new customers.
    35. 35. Social Mining Description: Analyzing various social networks and movements, looking for brand penetration, identifying influencers in conversations and a static map of associated terms. Advantages: Entering the social dialogue and hot topics at the right time multiplies by 100 times the viralization View how a social network moves as time goes by Allows to know what that the user is talking about when referring to my products or my brand. Detection of influencers and detractors  Optimal visualization of the information. Identification of the tags used most frequently by the network to improve your SEO.
    36. 36. Social Network Tracking Description: Search the social network comments and mentions of interest of a particular issue or event for further evaluation, influencers detection and graphical display of the conversation to facilitate analysis. Advantages: Show real-time event (symposium, forum, seminar, etc..) with visual information.  Get opinions and feelings about a topic in social networks in real time Identify the influencers of a hot topic  Risk detection and prevention  Emotional mining: Know the term that is most popular for some people, brand, event, etc.and this way you can know about the generated feelings by the most important terms.
    37. 37. Web Content Scraping Description: Search the network content and publications on specific subjects of our interest, to detect, filter, collect and process relevant information in semireal time or batch. Associated with the semantic analysis this allows the detection and classification of the contents effectively. Advantages: Allows the generating of sites in a dynamic way without any intervention or exhaustive searches, with the contents collected and categorized. Unifies in a single web all the tasks that users have to do manually, so it saves them money and generates loyalty.
    38. 38. Tele5: Monitoring of logs for Streaming Videos Description: Monitoring the download and streamming of videos. Analysis of streaming Quality of streaming Peaks of service and bottle neck Advantages: Problems detection and alerts Optimization of service Tracking of campains
    39. 39. Massive information tagging Description: Allows you to label and categorize automatically and massively, any type of content or information. Advantages: Allows searching, categorization, clustering, and be able to extract value out of information otherwise hardly findable and usable. Utilizes state of the art tools to identify entities, NED systems, NERD. These tools combined with the use of disambiguation of entities using a Big Data system containing the Wikipedia and other sources of information. Speed ​processing capabilities and data volume superior to that of other systems.
    40. 40. SUMMARY
    41. 41. Is not about Big Data, is about getting maximum value from data: Get all the value data can give Process and analyze new types of data: Unstructured, semistructured, streams of data Convert data into big insights Become a Data driven company
    42. 42. “the best way to predict the future is to create it”
    43. 43. Ride The “Big Data” wave
    44. 44. Q&A