<thead id="fflbj"><font id="fflbj"><cite id="fflbj"></cite></font></thead>
    <progress id="fflbj"><thead id="fflbj"><font id="fflbj"></font></thead></progress>

            課程目錄:Big Data Business Intelligence for Govt. Agencies培訓
            4401 人關注
            (78637/99817)
            課程大綱:

                     Big Data Business Intelligence for Govt. Agencies培訓

             

             

             

            Each session is 2 hours
            Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Govt.
            Case Studies from NIH, DoE
            Big Data adaptation rate in Govt. Agencies & and how they are aligning their future operation around Big Data Predictive Analytics
            Broad Scale Application Area in DoD, NSA, IRS, USDA etc.
            Interfacing Big Data with Legacy data
            Basic understanding of enabling technologies in predictive analytics
            Data Integration & Dashboard visualization
            Fraud management
            Business Rule/ Fraud detection generation
            Threat detection and profiling
            Cost benefit analysis for Big Data implementation
            Day-1: Session-2 : Introduction of Big Data-1
            Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume.
            Data Warehouses – static schema, slowly evolving dataset
            MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc.
            Hadoop Based Solutions – no conditions on structure of dataset.
            Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
            Batch- suited for analytical/non-interactive
            Volume : CEP streaming data
            Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
            Less production ready – Storm/S4
            NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database
            Day-1 : Session -3 : Introduction to Big Data-2
            NoSQL solutions
            KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
            KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
            KV Store (Hierarchical) - GT.m, Cache
            KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
            KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
            Tuple Store - Gigaspaces, Coord, Apache River
            Object Database - ZopeDB, DB40, Shoal
            Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
            Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
            Varieties of Data: Introduction to Data Cleaning issue in Big Data
            RDBMS – static structure/schema, doesn’t promote agile, exploratory environment.
            NoSQL – semi structured, enough structure to store data without exact schema before storing data
            Data cleaning issues
            Day-1 : Session-4 : Big Data Introduction-3 : Hadoop
            When to select Hadoop?
            STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
            SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB)
            Warehousing data = HUGE effort and static even after implementation
            For variety & volume of data, crunched on commodity hardware – HADOOP
            Commodity H/W needed to create a Hadoop Cluster
            Introduction to Map Reduce /HDFS
            MapReduce – distribute computing over multiple servers
            HDFS – make data available locally for the computing process (with redundancy)
            Data – can be unstructured/schema-less (unlike RDBMS)
            Developer responsibility to make sense of data
            Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS
            Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when?
            Hadoop vs. Other NoSQL solutions
            For interactive, random access to data
            Hbase (column oriented database) on top of Hadoop
            Random access to data but restrictions imposed (max 1 PB)
            Not good for ad-hoc analytics, good for logging, counting, time-series
            Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access)
            Flume – Stream data (e.g. log data) into HDFS
            Day-2: Session-2: Big Data Management System
            Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services
            Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
            Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
            In Cloud : Whirr
            Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI :
            Introduction to Machine learning
            Learning classification techniques
            Bayesian Prediction-preparing training file
            Support Vector Machine
            KNN p-Tree Algebra & vertical mining
            Neural Network
            Big Data large variable problem -Random forest (RF)
            Big Data Automation problem – Multi-model ensemble RF
            Automation through Soft10-M
            Text analytic tool-Treeminer
            Agile learning
            Agent based learning
            Distributed learning
            Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut
            Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Govt.
            Insight analytic
            Visualization analytic
            Structured predictive analytic
            Unstructured predictive analytic
            Threat/fraudstar/vendor profiling
            Recommendation Engine
            Pattern detection
            Rule/Scenario discovery –failure, fraud, optimization
            Root cause discovery
            Sentiment analysis
            CRM analytic
            Network analytic
            Text Analytics
            Technology assisted review
            Fraud analytic
            Real Time Analytic
            Day-3 : Sesion-1 : Real Time and Scalable Analytic Over Hadoop
            Why common analytic algorithms fail in Hadoop/HDFS
            Apache Hama- for Bulk Synchronous distributed computing
            Apache SPARK- for cluster computing for real time analytic
            CMU Graphics Lab2- Graph based asynchronous approach to distributed computing
            KNN p-Algebra based approach from Treeminer for reduced hardware cost of operation
            Day-3: Session-2: Tools for eDiscovery and Forensics
            eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
            Predictive coding and technology assisted review (TAR)
            Live demo of a Tar product ( vMiner) to understand how TAR works for faster discovery
            Faster indexing through HDFS –velocity of data
            NLP or Natural Language processing –various techniques and open source products
            eDiscovery in foreign languages-technology for foreign language processing
            Day-3 : Session 3: Big Data BI for Cyber Security –Understanding whole 360 degree views of speedy data collection to threat identification
            Understanding basics of security analytics-attack surface, security misconfiguration, host defenses
            Network infrastructure/ Large datapipe / Response ETL for real time analytic
            Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data
            Day-3: Session 4: Big Data in USDA : Application in Agriculture
            Introduction to IoT ( Internet of Things) for agriculture-sensor based Big Data and control
            Introduction to Satellite imaging and its application in agriculture
            Integrating sensor and image data for fertility of soil, cultivation recommendation and forecasting
            Agriculture insurance and Big Data
            Crop Loss forecasting
            Day-4 : Session-1: Fraud prevention BI from Big Data in Govt-Fraud analytic:
            Basic classification of Fraud analytics- rule based vs predictive analytics
            Supervised vs unsupervised Machine learning for Fraud pattern detection
            Vendor fraud/over charging for projects
            Medicare and Medicaid fraud- fraud detection techniques for claim processing
            Travel reimbursement frauds
            IRS refund frauds
            Case studies and live demo will be given wherever data is available.
            Day-4 : Session-2: Social Media Analytic- Intelligence gathering and analysis
            Big Data ETL API for extracting social media data
            Text, image, meta data and video
            Sentiment analysis from social media feed
            Contextual and non-contextual filtering of social media feed
            Social Media Dashboard to integrate diverse social media
            Automated profiling of social media profile
            Live demo of each analytic will be given through Treeminer Tool.
            Day-4 : Session-3: Big Data Analytic in image processing and video feeds
            Image Storage techniques in Big Data- Storage solution for data exceeding petabytes
            LTFS and LTO
            GPFS-LTFS ( Layered storage solution for Big image data)
            Fundamental of image analytics
            Object recognition
            Image segmentation
            Motion tracking
            3-D image reconstruction
            Day-4: Session-4: Big Data applications in NIH:
            Emerging areas of Bio-informatics
            Meta-genomics and Big Data mining issues
            Big Data Predictive analytic for Pharmacogenomics, Metabolomics and Proteomics
            Big Data in downstream Genomics process
            Application of Big data predictive analytics in Public health
            Big Data Dashboard for quick accessibility of diverse data and display :
            Integration of existing application platform with Big Data Dashboard
            Big Data management
            Case Study of Big Data Dashboard: Tableau and Pentaho
            Use Big Data app to push location based services in Govt.
            Tracking system and management
            Day-5 : Session-1: How to justify Big Data BI implementation within an organization:
            Defining ROI for Big Data implementation
            Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain
            Case studies of revenue gain from saving the licensed database cost
            Revenue gain from location based services
            Saving from fraud prevention
            An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation.
            Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System:
            Understanding practical Big Data Migration Roadmap
            What are the important information needed before architecting a Big Data implementation
            What are the different ways of calculating volume, velocity, variety and veracity of data
            How to estimate data growth
            Case studies
            Day-5: Session 4: Review of Big Data Vendors and review of their products. Q/A session:
            Accenture
            APTEAN (Formerly CDC Software)
            Cisco Systems
            Cloudera
            Dell
            EMC
            GoodData Corporation
            Guavus
            Hitachi Data Systems
            Hortonworks
            HP
            IBM
            Informatica
            Intel
            Jaspersoft
            Microsoft
            MongoDB (Formerly 10Gen)
            MU Sigma
            Netapp
            Opera Solutions
            Oracle
            Pentaho
            Platfora
            Qliktech
            Quantum
            Rackspace
            Revolution Analytics
            Salesforce
            SAP
            SAS Institute
            Sisense
            Software AG/Terracotta
            Soft10 Automation
            Splunk
            Sqrrl
            Supermicro
            Tableau Software
            Teradata
            Think Big Analytics
            Tidemark Systems
            Treeminer
            VMware (Part of EMC)

            538在线视频二三区视视频