Sunteți pe pagina 1din 7

Tutorial #1 - Rapid prototyping a streaming analysis

solution
Objectives
The key objective of the lab is to illustrate how to set up a development environment and rapidly
prototype a streaming analysis solution. Using a simple example you will learn how to:

Set up a HSDP working environment for your solution


Configure an analytic query group that will analyze an input data set and generate insights for
your input data to get the desired insight analytic output
Run the solution

Prerequisite reading
It is highly recommended to review and refer to the workshop document to get a good conceptual
background before starting this tutorial.

HSDP 3.0 Workshop Document Reference

Workshop #1 - Installing HSDP for design time activities


Workshop #2 Designing and configuring analytic queries

HSDP Manual Reference:

Setting up the HSDP working directory: Hitachi Streaming Data Platform, Setup and Configuration
Guide (MK-93HSDP000-04), Chapter 1. Creating working directories
Configuring the analytic query group: Hitachi Streaming Data Platform, Setup and Configuration
Guide (MK-93HSDP000-04), Chapter 2. Developing analysis scenarios using CQL
Example of External Adapter: Hitachi Streaming Data Platform, Setup and Configuration Guide
(MK-93HSDP000-04), Chapter 3. Developing external adapters
Testing analysis scenarios in local mode: Hitachi Streaming Data Platform, Application
Development Guide (MK-93HSDP001-04), Chapter 6. Working with the CQL debugging tool
Command Reference: Hitachi Streaming Data Platform, Setup and Configuration Guide (MK93HSDP000-04), Chapter 5.
CQL Reference: Hitachi Streaming Data Platform, Application Development Guide (MK93HSDP001-04), Chapter 4.

Setting up the environment to run the tutorial


Installing HSDP
Install the HSDP 3.0 Development package. Please refer to HSDP 3.0 Installation document.

Note: If you are doing the tutorial from the HDS Global Demonstration and Learning Lab (GDLL),
or on the Hitachi Cloud account then you can skip installing HSDP. HSDP is typically installed
under /opt/hitachi/hsdp.

Installing the tutorial


Get the hsdp_rapid_prototyping_tutorial.tar.gz file. This contains the data files required for the
lab.
Note: If you are doing the tutorial from the HDS Global Demonstration and Learning Lab
(GDLL), then the zip file is available in your home directory.
Extract the data file from the hsdp_rapid_prototyping_tutorial.tar.gz file into a temporary
directory. This tar file contains the sample data file that will used during the lab.
$ tar -zxvf hsdp_rapid_prototyping_tutorial.tar.gz <temporary-directory>
The extracted data file is shown below:
<temporary-directory>/SMART_METER_STREAM.csv

High Level Description of the tutorial


In this lab tutorial, we will use a sample dataset and execute analytic queries to derive insights from
the sample dataset. This lab is very useful for getting started with HSDP and for rapidly doing a
proof of concept. By following simple set of conventions quick prototyping of insights from the
data can be demonstrated. A key point to note is that you are simulating a streaming scenario
without having to spend a lot of time and effort in coding the solution.
In this tutorial we demonstrate the procedure to capture trend in the data set. The sample data set
has data tuples (records) from a smart meter Each tuple has a time stamp, meter id, city code and
energy utilized every second. The objective is to understand the energy utilization trend by
computing the moving average utilization over a period of 10 minutes.
One of the core strengths of HSDP is to compute moving averages on large volumes of data. By
computing moving averages the underlying trends are better understood and noise is eliminated.
The query listed below (Step 2) computes the moving average as the data flows through HSDP.
When the solution is run, it reads the energy data from an input file, performs computation and
writes the insights to an output file.
The following schematic diagram illustrates the solution.
Sample
Data

HSDP

Insights

Running the tutorial


Step 1 Setting up the HSDP working directory
Create a working directory by running the following command. HSDP working directory is essential
for doing development, testing and finally packaging the solution for deployment.
$ /opt/hitachi/hsdp/bin/hsdpsetup -dir /<full-path-of-any-directory>/hsdp_rapid_prototype
_tutorial
Go to the hsdp_rapid_prototype_tutorial/ directory and observe the contents. All the lab work
will be done under this directory.
A brief explanation of the directories:

<HSDP-working-directory>/
bin
HSDP tools and commands
conf
Directory for configuration files for query groups.
exadaptor Directory for external adapters
example Directory for sample code
inadaptor Directory for internal adapters
lib

Directory for extension libraries

logs

HSDP generated log files

query
spool

Directory for query groups.


Contains system files that SDP server uses.

trc

Contains trace files the HSDP outputs.

Step 2 Configuring the analytic query group


Step 2.1 Create a CQL file named smart_meter_quick_prototype.cql as follows in
the ./query/ directory:
The following CQL code illustrates the following:

Defining a stream that reflects the schema of the input record or tuple
Computing the moving average of energy utilization for every home in a city. In this
example we are computing using a window of 10 minutes and outputting the insights every
10 minutes.

// Define Stream
register stream Smart_meter_stream (
ts timestamp(9), meter_id varchar(4), city_code varchar(2), elec_util
int);
//Compute the moving average
register query Smart_meter_stream_output rstream [10 minute] (
select city_code, avg(elec_util) as elec_util
from Smart_meter_stream [range 10 minute]
group by city_code);

Step 2.2 Create a query-group-properties file named smart_meter_quick_prototype.qg in


the ./conf/ directory as follows:
Every query group has a associated query group property file which defines various parameters that
defines the behavior of the query group execution. For example, the query group has the time

stamp mode set to DataSource this tells the HSDP engine to use the timestamp in the data file
when performing the computations.
# Specify the path to a query definition file that defines a query group.
(Mandatory)
querygroup.cqlFilePath=query/smart_meter_quick_prototype.cql
# Specify timestamp mode. Default is Server mode.
stream.timestampMode=DataSource
# Specify the unit of timestamp adjustment when DataSource timestamp mode is
set.
stream.timestampAccuracy=unuse
# Specify the name of time data column in stream schema. If there is more
than one stream, all of input streams must have this column
stream.timestampPosition=ts

Step 3 Preparing the data


The data for the lab is already prepared. Create a directory named input under the working
directory and copy the data file from the temporary directory to the ./input/ directory.
./input/SMART_METER_STREAM.csv. View the data file. A sample of the data file is
provided below
Timestamp

Meter ID

City Code

Energy Utilization

5/30/2016 0:00
5/30/2016 0:00
5/30/2016 0:00

1
2
3

BG
BB
CC

51
51
51

As a convention,

The name of the input file should be the same as the name of stream into which the data is
ingested.
The name of the file should be in uppercase.
The data in the file should be in CSV format that reflects the schema of the stream (each
column should be separated by a comma character ,).

Step 4 Running the analysis and examining the output


Step 4.1 Create a directory named output in the working directory.
Step 4.2 Start the SDP manager by running the following command with root privileges. SDP
manager starts all the background processes required for running the HSDP server.
$ sudo /opt/hitachi/hsdp/bin/hsdpmanager -start

Step 4.3 To test the analysis query, run the following command from the hsdp_rapid_prototype
_tutorial/ directory.
$ ./bin/hsdpcqldebug smart_meter_quick_prototype -i input -o output
Running the command will do the following

Start the HSDP server and execute the query on the incoming data stream.
Read the input file and stream the data into the HSDP server

Write the insights obtained (moving average) to an output file.

Step 4.4 View the analysis results in the following file:


$ more ./output/SMART_METER_STREAM_OUTPUT.csv
Note: As a convention, the names of the output files that are created and their format is similar to
that of the input files as we discussed in Step 3.

Step 5 Cleaning up
Step 5.1 Remove the output files and log files by running the following command.
$ rm -rf ./output/* ./logs/* ./trc/*
Step 5.2 Stop the SDP manager by running the following command with root privileges.
$ sudo /opt/hitachi/hsdp/bin/hsdpmanager -stop

Additional lab exercises:


The following exercises are provided to enhance the learning of the product features.
Add a new query to compute the sum of utilization in the past 10 minutes and group the results
based on city. Check your answer by comparing your query with the following query:
register stream Smart_meter_stream (
ts timestamp(9), meter_id varchar(4), city_code varchar(2), elec_util int);
register query Smart_meter_stream_output rstream [10 minute] (
select city_code, sum(elec_util) as elec_util
from Smart_meter_stream [range 10 minute]
group by city_code);

Run the lab again with the modifications mentioned earlier and observe the output by running the
following command:
$ ./bin/hsdpcqldebug smart_meter_quick_prototype -i input -o output

Debugging tips
Examine them for debugging any issues when running the labs.
Log file name

Purpose

./log/SDPServerMessage<n>.log

Debugging CQL issues

/var/log/hitachi/hsdp/ManagerMessage <n>.log

Log file for SDP manager

/var/log/hitachi/hsdp/CoordinatorMessage <n>.log

Log file for SDP Coordinator

/var/log/hitachi/hsdp/BrokerMessage<n>.log

Log file for SDP Broker

End Tutorial.

S-ar putea să vă placă și