Tuesday, May 5, 2020

A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Techno

Question: Discuss about A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology? Answer: Literature review Action Research The researcher has verified eight references and on the basis of that it is to be finalized to take Oracle as a Hadoop technology company for conversion into traditional and relational database. On the basis of some research paper the researcher has stated some detailed part of data analyst and Hadoop system technology. Oracle Data integration Overview Oracle data integration is a process of extracting, transforming and loading data to the targeted database. The processes apply the declarative design tool referred to as the Oracle Data Integrator preferable known as the ELT tool. The integration process is defined by the knowledge modules that focus on the technical implementation of the data. The tool has been the replacement to the Oracle Warehouse Builder (Greenwald, Stackowiak Stern, 2013). History of the company According to Boyd-Bowman (2012), Oracle Corporation has been in the front database, warehousing, and related data management technologies provider. Technologies have facilitated the creation of the middleware, big data and the cloud-related solutions in the recent time. ODI is a strategic data integration platform of Oracle that was released in July 2010. The application has the origin from the Oracle Synopsis that was acquired in October 2006. The ODI provided the easy to use and a better approach to satisfy the data integration requirements of the Oracle software products. Advantages of Oracle Data Integration The Oracle Data Integrator Enterprise Edition applications have an environment that enables the developers to have the focus to architecting interfaces that gets them have know-how on what developers are supposed to do and in which way. The feature gets the tremendous reduction in the development cost and the faster delivery of the module or application. The application guarantees high performance such that the Oracle Data Integrator can run very well on the Oracle Exadata that provides the most efficient and effective data integration platform. Oracle Data Integrator has a feature of heterogeneity that provides the capability and the flexibility of deploying the Oracle database implementation to any other database. It does not waste any resources during the implementation processes. The processes do boost the delivery of the project as the Oracle Data Integrator contains over 100 extensible data integrations. The Oracle Data Integrator provides some strong connections with the business intelligence, data warehousing, and the SOA technologies applied in the decoupling of the business applications and the transformation of large files of data (Oracle, 2011). Architecture of Oracle The Oracle architecture consists of two major parts of the Oracle Instance and the Oracle database (Kyte Kuhn, 2014). Oracle Instance defines the means of accessing the Oracle database by enabling one database to be opened at a time. The Oracle instance has the internal memory structures and background processes. The background processes may include the PMON, RECO, SMON, DBWO, LGWR, DOOO, CKPT, and others which are computer programs that perform the input and output functionalities to the Oracle Database. The processes extend the capability to monitor other prospective Oracle processes for the purposes of ensuring that the database is reliable and performs accordingly. The internal memory structure referred to as System Global Area (SGA) is allocated at any time the instance starts up and it goes hand in hand with the background process. The System Global Area (SGA) describes a shared memory region that contains the data and the control information for a single instance that is runn ing and it gets deallocated once the instance is shut down. There is another memory process Global Area (PGA) that is applied to a single process and cannot be shared amongst the processes. It contains the sort area, cursor state, stack space and the session information. The Oracle database of the Oracle architecture contains the files at times called an operating system file that does store the database information of an organization or a firm. The sampled files may be the redo logs files that are applicable during the recovery of the database, in any case, application program failure, and the instance failure. The archived redo logs files that are applied to recover the database in case of disk fail, parameter file for specifying the parameters used in the configuration of the Oracle instance when it starts. Also, the password file that is applied to special database users authentication and the alert and trace log files used to store the information about the errors and actions that affect the database configuration. Figure 1. Shows the Oracle Architecture (Source: Hu et al. 2014, pp-780) Internship responsibility The internship program will be undertaken with holding the position of the programmer analyst under the field of Oracle Data Integration. During the program, the job roles will have some designated roles, duties, and responsibilities. The job roles will get involved in the definition of the ETL architecture where the integration of data will be done. The integration will be made at the real time by batch processing to populate the Warehouse, and the job roles will employ the Oracle Data Integration Enterprise application. the job roles will be involved with the administration and management of the database where the job roles will be responsible for data modeling. The modeling includes the creation of the logical and physical models applicable for staging, transition and production of the warehouse. Also, the job roles will be responsible for undertaking data analysis, the designing and developing of the database environment. The job roles will apply the ODI Knowledge Modules of reverse engineering, loading, check, integration, and service to create interfaces that the job roles will apply to cleaning, loading and transforming data from data source to data targeted destination. Other than the above duties the job roles will create some scripts in UNIX to aid in data transformation also create the presentation catalogs; folders followed with the performance of Global Consistency Check in the Oracle Business Intelligence Administration Tool. Proposal Iteration 1: orientation and planning The first step that the job roles already experienced is the orientation of the company. In this iteration all the information about the company is known. HR Manager also discussed about the working hours and the job description. Initially, the job roles will be provided with all the softwares and books to learn Oracle. Basic concepts are taught by the lead employees and daily tasks are given to prove our caliber. This iteration mainly bothers about learning Oracle and gathering all the important information about Oracle. Another good thing in this iteration is that we attend the weekly meeting, seminars and taking small responsibilities to adjust to the work environment At the end, we must try to reach goals and objectives of the job responsibilities set by the HR Manager and Project Manager (M Sreedevi, personal communication, January 27, 2016). Finally, the ID card was issued by the manager and separate system in the office was allocated (M Sreedevi, personal communication, January 28, 2016). Iteration 2: Understanding of the Oracle concepts An advanced understanding of the oracle application needs an intensive training about 30 days which is my second iteration. During my training the job roles will undertake research about the Oracle data integration which will be accompanied by the reading of the books and journals. To supplement the content, the job roles will make consultations from the individuals who are already in the field for clarification of the ideas behind the concept. The training usually starts with history of Oracle followed by Oracle instance and Oracle data base (Y Venkat, personal communication, January 29, 2016). Iteration 3: Data Analysis and Requirement Definition The job roles will work hand in hand with the development team in analyzing the company data so as to have the determination of the requirements that will guide the development of the data warehouse for the company. During the iteration of data analysis and requirement definition, the job roles will undertake feasibility study of staff and the stake holders of the company so as to define the appropriate information that will describe their requirements. In the end, the job roles would have the knowledge to data analysis, field feasibility and overall determination of system requirements. Iteration 4: Database designing Upon identification of the requirements, the job roles will go ahead to design the database. The design will provide a prototype that will aid in the identification of redundancy within the database and provide the appropriate correctness. The designing process will help me the apply the appropriate ideas so as to go by providing some relational diagrams so as to ensure the whole system meets the requirements defined and goes by the standards of the company. The job roles will gain the experience of database designing that gives the view of the actual look of the final database system. The output will be applied as the guide to the follow-up iteration. Research paper Big data Hadoop Big data is about two basic things big data and analytics. The definition of big data is totally depends on the storage. There are also other mixture of big data namely data velocity and data variety. The three basic Vs of big data analytics is volume, variety and velocity. Hadoop system is a framework written on Java programming language as open source software. It is used for distributed processing and storage of very huge sets on clusters of computers. It was supported by Apache Software Foundation with an initial release on 2011. The below diagram is the ecosystem of Big data Hadoop system: Problem of implementation of Big Data Hadoop system Challenges Assessment of risk: In every multi-national company the technology of Hadoop results a very high risk. New and latest innovative tools result the technology spoof of risk management throughout the organization. Business real time analytics: There is a great mismatch between several businesses that where should they allocate the available resources. In most of the company data and business work very fast than employees. However, it becomes a bottleneck in some places as it is very useful technology. Proper rate of Interest: Big data Hadoop technology incurs a high overhead in almost all industry. The company had to maintain high rate of investment in future so that the investors and the stakeholders manipulates the company share. Potential for big data: The application of big data potential will soon be unlocked. In a definite way the company with a personal profile on any websites will interact with big data in a customized way. Need to transform into Hadoop As compared to traditional database Hadoop system include tolerant fault storage known as HDFS which is able to capture large amount of data and information incrementally and it has the capacity of survival into the storage. Additional features like volume, variety and velocity of data, timeliness, heterogeneous in nature, resistance of map reduction is available as compared to any traditional database management system. Reasons to overcome the problem of present data in traditional database system It is very important to collaborate with solution of digital advanced technology i.e., a customized system of technology with respect to big data Hadoop technology. By maintain and integrating database management system the organization allows getting accessible from different channel source. To appoint a highly qualified trained employee for maintaining big data Hadoop technology is very crucial for the company. As the technology lacks the personal interaction between different superior and subordinates it is very important to follow some guidelines which increases the rate of interest and reduce cost lie fixed and variable. Architecture of how to connect to the database The below is the diagrammatic view of how Hadoop system is connect to the traditional database system. Figure 2: Hadoop architecture (Source: Hu et al. 2014, pp-685) How to Sqoop the data in to the Hadoop distributed file system (HDFS) using Oracle Database Management System and various methods used Figure: 1 It implies adapters application of Hadoop. Figure 3: Oracle Hadoop Environment (Source: Chen, Mao Liu 2014, pp-265) Figure: 2 It addresses two knowledge modules called Movie which contain package, mappings, variables and scenario. The below database contains movies of data,genres movie, and cast movie. Figure 4: Steps of Hadoop technology into traditional relational database system (Source: Hu et al. 2014, pp-690) Figure: 3 In this module: (1) IKM SQL to Hadoop system using (Sqoop) are designed for import of data, (2) IKM SQL to Hadoop Hive system using (Sqoop) are based on large datasets, (3) Knowledge tasks module: (i) to generate script of sqoop, (ii) addition of execution to script of sqoop, (iii) to execute script of sqoop, (iv) to remove script of sqoop and (v) to remove file of logs. Figure 5: Steps of Hadoop technology into traditional relational database system (Source: Hu et al. 2014, pp-658) Figure: 4 In this figure knowledge module for the IKM SQL to HDFS (Sqoop) and the IKM SQL to HDFS Hive (Sqoop) is configured. Figure 6: Steps of Hadoop technology into traditional relational database system (Source: Hu et al. 2014, pp-658) Figure: 5 In this figure the module knowledge are documented and ODI variables thus increases the scalability and flexibility of knowledge module. Figure 7: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-503) Figure: 6 In figure 6, the presentation of mapping of ODI 12C structure of mapping, SQL to HDFS files with two basic instances of reused mapping, My_tv_shows and My_dramas for import of data from a pack of relational database tables into two different targets HDFS directories: Tv_shows and Dramas. Figure 8: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-520) Figure: 7 Figure 7 shows the condition of filter or the target of HDFS directory known as Tv_shows. Figure 9: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-515) Figure: 8 It reflects the reused ODI mapping, My_movies which is used by the 12C ODI (SQL to File Sqoop). This reused system of mapping use another components like joins, datasets, filters, and set of distinct sets. The components of filters data is done by movie year. Figure 10: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-358) Figure: 9 It illustrates the mapping of basic ODI with two specifications of deployment: the Incremental import and the Initial import. The above mentioned specifications of deployment use the IKM SQL to HDFS File (Sqoop) in datastores: Tv_shows and Dramas. Figure 11: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-621) Figure: 10 It reflects the specifications of deployment for the data stores called Tv_shows. It imports data in Overwrite mode and adding to the new dataset. (2) A variable named Var_Movie_Year has been used as a suffix name for the HDFS document and temperorary object name. Figure 12: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-256) Figure: 11 It also reflects the specifications of deployment for the data stores called Tv_shows. It imports data in Overwrite mode and adding to the new dataset. (2) A variable named Var_Movie_Year has been used as a suffix name for the HDFS document and temperorary object name. Figure 13: Steps of Hadoop technology into traditional relational database system (Source: Mazumdar Dhar 2015, pp-645) Figure: 12 It shows two parallel executions that can be implemented and designed by selecting data stores and by addressing those data stores from outside of group execution. This action will normally create another unit execution of the selected data stores in the same unit group execution. Figure 14: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-525) Figure: 13 It shows the Initial deployment specification for HDFS directories, Tv_shows, and Dramas. In this context, the sets of importing Sqoop will reduce the time of loading the data from the relational database into directories of HDFS directories. Figure 15: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-458) Figure: 14 It shows the log of sessions for the deployment of Initial Import specification. The units of two executions are Drama and Tv Shows were located parallel. Figure 16: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-215) Figure: 15 It explains the ODI package design known as Pkg_Sql_To_HDFS_File_Initial. The initial package and executions scenario is parallel to Sql to HDFS file. Figure 17: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-608) Figure: 16 It illustrates the operator of ODI with three basic scenarios. The actual package has already completed its execution except running status scenario. The below diagram will explain the data sets. Figure 18: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-589) Figure: 17 In figure 17, the levels of parallelism with ODI and Sqoop are indentified. The diagram is divided into three levels. In level 1, the package of ODI is used to launch mapping of ODI in parallel. In level 2, the parallel session is designed for Sqoop mapping with more than one Hive table and HDFS table. In level 3, The mappers of Sqoop can be implemented and used to import data sets in parallel. Figure 19: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-525) Figure: 18 In figure 18, it shows an illustration of configuration of physical schema to File technology. The file type move_file is used to represent the directory of HDFS in Sqoop module knowledge. Figure 20: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-548) Figure: 19 It configures the system of physical schema to HIVE technology. The schema directory of Movie_demo is HIVE database where the target is physically located. Figure 21: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-514) Figure: 20 The below example shows the steps of configuration of Sqoop module knowledge in a mapping of 12C ODI: Figure 22: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-548) Figure: 21 Figure 21 reflects the module of load knowledge option by selecting FILTER_AP option. Figure 23: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-210) Figure: 22 It shows the window which reflects the properties of data store target known as Tv_shows. The module knowledge option in data store is IKM SQL to HDFS file system (Sqoop). Figure 24: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-526) Figure: 23 In this figure the technology used in SQL is Oracle and the schema which is logical is known as Movie demo. Figure 25: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-256) Figure: 24 It shows the example of area box staging and enhances the module knowledge by selecting the target box area. Figure 26: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-512) Figure: 25 In this figure, using the target property area the integration of Sqoop strategy is import. Figure 27: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-203) Figure 28: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-550) The researcher will make the reader understand by mentioning the following steps: Prepare the input Map (). Run the provider user Map () code. Shuffle the output of the map code. Run the provider user Reduce () code. Produce the result output. Master slave architecture Master slave architecture is a communication model protocol where the process has uni directional over control of more than one system. Figure 29: Steps of Hadoop technology into traditional relational database system (Source: Chen, Mao Liu 2014, pp-874) Various tools for map reducing technique for both structured and unstructured real time streaming data Tracker of job: The node master that integrates resources in a cluster. Trackers of task: The deployed agents of cluster in each machine to reduce the task and to run the map. Server job history: The tracking components are deployed with a separate function or with job tracker. Use of Hcatalog, Storm, Hive mechanism and Mahout technology. Conclusion This is to conclude that the system of Big data Hadoop technology has improved several technological factor of company. Oracle has implemented Sqoop technology and enhanced many advanced technology through the conversion of Hadoop technology to traditional database system. It is mostly used in many multi-national industries to validate appropriate theories. In business the concept of data analytics is very popular and changing the emerging sectors of many database based company. References Anuradha, J. (2015). A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology.Procedia Computer Science,48, 319-324. Chen, M., Mao, S., Liu, Y. (2014). Big data: A survey.Mobile Networks and Applications,19(2), 171-209. Davenport, T. H., Barth, P., Bean, R. (2012). How big data is different.MIT Sloan Management Review,54(1), 43. Dev, D., Patgiri, R. (2014, December). Performance evaluation of HDFS in big data management. InHigh Performance Computing and Applications (ICHPCA), 2014 International Conference on(pp. 1-7). IEEE. Dittrich, J., Quian-Ruiz, J. A. (2012). Efficient big data processing in Hadoop MapReduce.Proceedings of the VLDB Endowment,5(12), 2014-2015. Dubey, V., Gupta, S., Garg, S. (2015). Performing Big Data over Cloud on a Test-Bed.International Journal of Computer Applications,120(10). Fan, W., Bifet, A. (2013). Mining big data: current status, and forecast to the future.ACM sIGKDD Explorations Newsletter,14(2), 1-5. Ferrando-Llopis, R., Lopez-Berzosa, D., Mulligan, C. (2013, October). Advancing value creation and value capture in data-intensive contexts. InBig Data, 2013 IEEE International Conference on(pp. 5-9). IEEE. Garlasu, D., Sandulescu, V., Halcu, I., Neculoiu, G., Grigoriu, O., Marinescu, M., Marinescu, V. (2013, January). A big data implementation based on Grid computing. InRoedunet International Conference (RoEduNet), 2013 11th(pp. 1-4). IEEE. Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., Khan, S. U. (2015). The rise of big data on cloud computing: Review and open research issues.Information Systems,47, 98-115. Howson, C., Hammond, M. (2014).Successful Business Intelligence: Unlock the Value of BI Big Data. McGraw-Hill Education. Hu, H., Wen, Y., Chua, T. S., Li, X. (2014). Toward scalable systems for big data analytics: a technology tutorial.Access, IEEE,2, 652-687. Kashyap, K., Deka, C., Rakshit, S. (2014). A review on big data, hadoop and its impact on business.International Journal of Innovative Research and Development,3(12). Katal, A., Wazid, M., Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. InContemporary Computing (IC3), 2013 Sixth International Conference on(pp. 404-409). IEEE. Kim, G. H., Trimi, S., Chung, J. H. (2014). Big-data applications in the government sector.Communications of the ACM,57(3), 78-85. Landset, S., Khoshgoftaar, T. M., Richter, A. N., Hasanin, T. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem.Journal of Big Data,2(1), 1-36. Liu, P., Wu, Z. F., Hu, G. Y. (2013). Big data: profound changes are taking place.ZTE Technol. J,19(4), 2-7. Mazumdar, S., Dhar, S. (2015, March). Hadoop as Big Data Operating System--The Emerging Approach for Managing Challenges of Enterprise Big Data Platform. InBig Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on(pp. 499-505). IEEE. McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D. J., Barton, D. (2012). Big data.The management revolution. Harvard Bus Rev,90(10), 61-67. ODriscoll, A., Daugelaite, J., Sleator, R. D. (2013). Big data, Hadoop and cloud computing in genomics.Journal of biomedical informatics,46(5), 774-781. O'Leary, D. E. (2013). Artificial intelligence and big data.IEEE Intelligent Systems, (2), 96-99. Padhy, R. P. (2013). Big data processing with Hadoop-MapReduce in cloud systems.International Journal of Cloud Computing and Services Science,2(1), 16. Prajapati, V. (2013).Big data analytics with R and Hadoop. Packt Publishing Ltd. Purcell, B. (2013). The emergence of" big data" technology and analytics.Journal of Technology Research,4, 1. Raghupathi, W., Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential.Health Information Science and Systems,2(1), 3. Rani, G., Kumar, S. (2015). Hadoop Technology to Analyze Big Data. Sharma, M., Hasteer, N., Tuli, A., Bansal, A. (2014, September). Investigating the inclinations of research and practices in hadoop: A systematic review. InConfluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference-(pp. 227-231). IEEE. Stich, V., Jordan, F., Birkmeier, M., Oflazgil, K., Reschke, J., Diews, A. (2015). Big Data Technology for Resilient Failure Management in Production Systems. InAdvances in Production Management Systems: Innovative Production Management Towards Sustainable Growth(pp. 447-454). Springer International Publishing. Sun, Z., Chen, F., Chi, M., Zhu, Y. (2015). A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing. InData Science(pp. 120-126). Springer International Publishing. Suthaharan, S. (2014). Big data classification: Problems and challenges in network intrusion prediction with machine learning.ACM SIGMETRICS Performance Evaluation Review,41(4), 70-73. Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., ... Georgakopoulos, D. (2014). A security framework in G-Hadoop for big data computing across distributed Cloud data centres.Journal of Computer and System Sciences,80(5), 994-1007. Zuech, R., Khoshgoftaar, T. M., Wald, R. (2015). Intrusion detection and big heterogeneous data: A survey.Journal of Big Data,2(1), 1-41.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.